Add Files to 7z Archive with Almost Full Compression

7z and zip archives are great for compressing backup copies of files. I like to keep previous versions of files in an archive and add newer versions as they are updated. This method works great for quick and easy version tracking without sacrificing much space (especially when it comes to source code and text files).

Recently, however, I noticed a slight discrepancy when adding files to one of my archives — the archives’ file size grew as if each file I added was compressed separately and then appended to the original archive. This is probably best described with an example. Let’s say we have a project with the file foo.log. We then number each file as we make new versions of it.

PS C:\Project> dir
    Directory: C:\Project
Mode              LastWriteTime     Length Name
----              -------------     ------ ----
-a---             15:55             8841   foo-1.log
-a---             23:13             8745   foo-2.log
-a---             22:56             8842   foo-3.log
-a---             10:32             8470   foo-4.log
-a---             15:23             8485   foo-5.log

We will start with just the first file, foo-1.log. If we archive it as foo-1.7z, we get a file that is 1756 bytes — 20% of its original size. If we do the same thing with foo-2.log, we get foo-2.7z that is 1686 bytes — close to 19% this time. We can conclude that this data can be compressed to around 20% of its original size.

PS C:\Project> dir
    Directory: C:\Project
Mode              LastWriteTime     Length Name
----              -------------     ------ ----
-a---             12:53             1756   foo-1.7z
-a---             12:53             1686   foo-2.7z

The “problem” arises when we add files to an existing archive. Let’s make a new 7z archive with only foo-1.log. It’s the same size as foo-1.7z.

Now we open up foo.7z and add foo-2.log to it. The second log file is compressed and saved within foo.7z, but at 3393 bytes it’s almost as large as having both foo-1.7z and foo-2.7z stored together. The overall compression here is still around 19%, but it could be better.

We can confirm that adding to an archive compresses it differently because if we select foo-1.log and foo-2.log and make a new archive (call it Project.7z) with both files from the get go, the resulting archive is only 2324 bytes — that’s just over 13% of the original uncompressed size!

PS C:\Project> dir
    Directory: C:\Project
Mode              LastWriteTime     Length Name
----              -------------     ------ ----
-a---             12:53             1756   foo-1.7z
-a---             12:53             1686   foo-2.7z
-a---             12:58             3393   foo.7z
-a---             12:59             2324   Project.7z

So how can we achieve better compression when adding a file without having to extract and archive all the existing files? The only workaround I have found is to combine the de-duplication benefits of a wim with the high compression ratio of 7z. This isn’t perfect but is better than using a straight 7z alone.

We take the first file and add it to a new wim archive, foo.wim. It will not compress anything, but will simply store the files together. Next, take the wim file and add that to a new 7z archive, FullFoo.7z. Since the wim contains the original file’s data uncompressed, the 7z archive should be able to compress down almost as much as if you just archived the original files themselves in a 7z. In this case, we end up with the 7z being 24% of the uncompressed size — not as good as a straight 7z could get, but it gets better.

PS C:\Project> dir
    Directory: C:\Project
Mode              LastWriteTime     Length Name
----              -------------     ------ ----
-a---             15:55             8841   foo-1.log
-a---             14:08            10103   foo.wim
-a---             14:08             2137   FullFoo.7z

When adding new files, open the 7z in 7z explorer, navigate into the wim, and add your files. When you close the 7z explorer, it will ask if you want to save the new contents of the wim. Select yes, and then the 7z will re-compress the entire contents of the wim (all of your files’ data) as one large sum. So after adding foo-2.log to the wim inside, the FullFoo.7z only grows to 2797 bytes — 16% of original size. Still not as good as extracting and recompressing everything, but still better than simply adding files to an existing 7z.

PS C:\Project> dir
    Directory: C:\Project
Mode              LastWriteTime     Length Name
----              -------------     ------ ----
-a---             23:13             8745   foo-2.log
-a---             14:12             2797   FullFoo.7z

This method is somewhat similar to extracting the entire archive, adding your new files, and then creating a new archive from scratch… except with a little overhead from the wim file and without the need to manually extract everything. This contrasts to just adding new files to a 7z, where instead the 7z explorer seems (I don’t know for sure) as if it compresses the new files separately (ignoring any already compressed data) and then adds them to the existing archive without re-compressing any of the existing data in the archive.

As a final comparison, I created a 7z file with all 5 of the foo log files at once, which came out to 3629 bytes or 8.3%. In another 7z, I added each file one by one. This file was 7943 bytes or 18.3%. Lastly the wim version with files still added one by one ended up at 4411 bytes or 10.2%. Almost fully compressed like the “all at once” method, but less work than extracting and recompressing everything.

PS C:\Project> dir
    Directory: C:\Project
Mode              LastWriteTime     Length Name
----              -------------     ------ ----
-a---             14:20             3629   AllAtOnce.7z
-a---             14:21             4411   FullFoo.7z
-a---             14:30             7943   OneByOne.7z

I believe the space savings adds up, especially with larger projects containing multiple large source files and many versions of them. I hope this technique helps someone!