7z and zip archives are great for compressing backup copies of files. I like to keep previous versions of files in an archive and add newer versions as they are updated. This method works great for quick and easy version tracking without sacrificing much space (especially when it comes to source code and text files).
Recently, however, I noticed a slight discrepancy when adding files to one of my archives — the archives’ file size grew as if each file I added was compressed separately and then appended to the original archive. This is probably best described with an example. Let’s say we have a project with the file foo.log
. We then number each file as we make new versions of it.
PS C:\Project> dir
Directory: C:\Project
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 15:55 8841 foo-1.log
-a--- 23:13 8745 foo-2.log
-a--- 22:56 8842 foo-3.log
-a--- 10:32 8470 foo-4.log
-a--- 15:23 8485 foo-5.log
We will start with just the first file, foo-1.log
. If we archive it as foo-1.7z
, we get a file that is 1756 bytes — 20% of its original size. If we do the same thing with foo-2.log
, we get foo-2.7z
that is 1686 bytes — close to 19% this time. We can conclude that this data can be compressed to around 20% of its original size.
PS C:\Project> dir
Directory: C:\Project
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 12:53 1756 foo-1.7z
-a--- 12:53 1686 foo-2.7z
The “problem” arises when we add files to an existing archive. Let’s make a new 7z archive with only foo-1.log
. It’s the same size as foo-1.7z
.
Now we open up foo.7z
and add foo-2.log
to it. The second log file is compressed and saved within foo.7z
, but at 3393 bytes it’s almost as large as having both foo-1.7z
and foo-2.7z
stored together. The overall compression here is still around 19%, but it could be better.
We can confirm that adding to an archive compresses it differently because if we select foo-1.log
and foo-2.log
and make a new archive (call it Project.7z
) with both files from the get go, the resulting archive is only 2324 bytes — that’s just over 13% of the original uncompressed size!
PS C:\Project> dir
Directory: C:\Project
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 12:53 1756 foo-1.7z
-a--- 12:53 1686 foo-2.7z
-a--- 12:58 3393 foo.7z
-a--- 12:59 2324 Project.7z
So how can we achieve better compression when adding a file without having to extract and archive all the existing files? The only workaround I have found is to combine the de-duplication benefits of a wim with the high compression ratio of 7z. This isn’t perfect but is better than using a straight 7z alone.
We take the first file and add it to a new wim archive, foo.wim
. It will not compress anything, but will simply store the files together. Next, take the wim file and add that to a new 7z archive, FullFoo.7z
. Since the wim contains the original file’s data uncompressed, the 7z archive should be able to compress down almost as much as if you just archived the original files themselves in a 7z. In this case, we end up with the 7z being 24% of the uncompressed size — not as good as a straight 7z could get, but it gets better.
PS C:\Project> dir
Directory: C:\Project
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 15:55 8841 foo-1.log
-a--- 14:08 10103 foo.wim
-a--- 14:08 2137 FullFoo.7z
When adding new files, open the 7z in 7z explorer, navigate into the wim, and add your files. When you close the 7z explorer, it will ask if you want to save the new contents of the wim. Select yes, and then the 7z will re-compress the entire contents of the wim (all of your files’ data) as one large sum. So after adding foo-2.log
to the wim inside, the FullFoo.7z
only grows to 2797 bytes — 16% of original size. Still not as good as extracting and recompressing everything, but still better than simply adding files to an existing 7z.
PS C:\Project> dir
Directory: C:\Project
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 23:13 8745 foo-2.log
-a--- 14:12 2797 FullFoo.7z
This method is somewhat similar to extracting the entire archive, adding your new files, and then creating a new archive from scratch… except with a little overhead from the wim file and without the need to manually extract everything. This contrasts to just adding new files to a 7z, where instead the 7z explorer seems (I don’t know for sure) as if it compresses the new files separately (ignoring any already compressed data) and then adds them to the existing archive without re-compressing any of the existing data in the archive.
As a final comparison, I created a 7z file with all 5 of the foo log files at once, which came out to 3629 bytes or 8.3%. In another 7z, I added each file one by one. This file was 7943 bytes or 18.3%. Lastly the wim version with files still added one by one ended up at 4411 bytes or 10.2%. Almost fully compressed like the “all at once” method, but less work than extracting and recompressing everything.
PS C:\Project> dir
Directory: C:\Project
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 14:20 3629 AllAtOnce.7z
-a--- 14:21 4411 FullFoo.7z
-a--- 14:30 7943 OneByOne.7z
I believe the space savings adds up, especially with larger projects containing multiple large source files and many versions of them. I hope this technique helps someone!