ZipFile extracting spanned files

kbar

It looks like ZipFile (lib_zipfile.h) can create spanning zip files using SetSpanning(). But can you extract spanned zip files using either the Legacy API or the new Maxon API?

file.zip.001
file.zip.002
file.zip.003

Can these be extracted using any of the SDK methods?

Thanks,
Kent

ferdinand

Hi,

I do not know what MAXON's lib can do and cannot, but Python's zipfile can also handle split and spanned archives.

import zipfile
import os

def some_filter(name): 
    """A filter function to sort out file objects based on their name and
     extension.
    """
    return True

def unpack(input_path, output_path):
    """Unpacks a (split or spanned) zip file.

    Args:
        input_path (str): The input path.
        output_path (str): The output path.
    
    Raises:
        ValueError: When the input path does not exist.
    """
    if not os.path.exists(input_path):
        raise ValueError("Something meaningful.")

    with zipfile.ZipFile(input_path, "r") as zip_object:
        # Extract only parts of the volume that match a specific filter.
        for name in [n for n in zip_object.namelist() if some_filter(n)]:
            zip_object.extract(name, output_path)
        # Or just extract everything, but since our filter just passes
        # through everything, this would be redundant.
        # zip_object.extractall(output_path)

def main():
    """
    """
    # Note that 'zipfile' actually looks for the file extension '.zip' in
    # passed path arguments. If you have split or spanned zip archives that
    # do not fully conform to the ISO norm, i.e. do not have a dedicated 
    # header file with the '.zip' extension, your file names have to follow 
    # the form '*.zip.enumerator'. '*.enumerator' will NOT be recognized. 
    # I once learned this the hard way.

    # Also note that there is a difference between split, spanned, and 
    # volume zip archives, but 'zipfile' will handle most of it for you.

    # Unpack the split archive starting with "test.zip.001" into the same
    # directory.
    unpack("test.zip.001", "")

if __name__ == "__main__":
    main()

Cheers,
zipit

Manuel

Hi,
sorry for the late reply.

the spanning function isn't working as we could expect. It doesn't cut the file.

If you have 10 Mb file and ask for 1Mb archive size, it will create two archives, one with the 10 Mb file (and that archive will be 1Gb) and one empty.
If you have 10 * 1 Mb files, it will create 10 archives with each file on each archives.

Every Archive are independent zip files. There name will not be (as you pointed)
file.zip.001
but
file.zip
file_1.zip
file_2.zip

You can just unzip all archive as independent files.
In every archive there is a dummy txt file that have to be added for osx.

You have to create your own solution. Using the Maxon API you need to read the file that you want to compress and write the buffer.

Cheers,
Manuel

kbar

@m_magalhaes

Thanks Manuel. Sounds like the MAXON implementation is indeed completely different to regular splitting of zip files. No worries, I will just create my own implementation to do what I need. Always try to see if the MAXON api can do it first though.

Cheers,
Kent

ferdinand

Hi,

I still do not know what Maxon's library does, but I just wanted to stress again (like in my code above) that splitting and spanning are not the the same thing and are also not part of the zip ISO norm AFAIK [1]. Spanning zip files are usually meant to occupy multiple exchangeable resources and therefor are of a fixed size (matching the size of the media type). Split archives are meant to be placed in a single directory and are of user chosen size. Both have a dedicated *.zip file as their first file which contains structural information as well as the first volume.

What you did show here (no dedicated *.zip file is present) is something some tools can do and I think it places the meta information in each file and allows you by that to loose parts of the archive while still being able to uncompress the rest. At least 7zip simply calls this Volumes.

Cheers,
zipit

[1] Due to not being normed, only a dodgy third party link here, but they have a nice table and also list compatibilities. They did forget Python's zipfile though, which tries to be a jack of all trades

Manuel

thanks @zipit

so if i understand, what why the function is called SetSpanning because it doesn't split the files.
I understood kent wanted to split his files.

But he showed the volume result as you mentioned

in any case, our api just do the Spanning.

ferdinand

Hi,

I actually just wanted to point out the differences, as the terms have here been used as if they were interchangeable. With you I was referring to @kbar. If anything, my post was a mild indicator for that Maxon's use of the term spanning is actually correct, as the function seems to output volumes of fixed size.

But as already stated twice: I have no clue what is going in Maxon's library

Cheers,
zipit

kbar

I may have had my terminology wrong so sorry if that caused any confusion. But my example of the output files was correct in what I wanted to unzip/extract.

I was wanting to extract "Split" files, not "Spanned".

In my case I have 1 very large file that I is split into multiple volumes. Each volume being the same number of bytes. This can be done in 7zip by choosing "split to volumes, bytes:" when using "Add to Archive".

So if myfile is 22MB file and I split by 10MB then you get the following files all in the same directory.

myfile.zip.001 - 10MB
myfile.zip.002 - 10MB
myfile.zip.003 - 2MB

These are "Split".

The reason I thought this might be the same as Spanning is because I thought that it actually was the same, expect that spanning is used when the files are on different archive devices. IE split is in the same folder, spanned is across multiple DVDs.

In any case it looks like the C++ SDKs can't handle extracting these, which is what I was wanting to find out. So I will just do this myself.

Note that I haven't used the C++ SDK to create a spanned file either. Since I have no need to. I want to extract them, not create them. So I have no idea what setting the "SetSpanning" flag actually does. But if it did indeed split them into different volumes of a fixed byte size, and put then into the same output folder, then I assumed there would be a feature also in the SDK to read these back and extract them. Otherwise what is the point of C4D being able to create them in the first place?

@zipit You mentioned that splitting would create a dedicated *.zip file. This is definitely NOT the case with 7zip. The files all start as *.zip.001, *.zip.002, *.zip.003 etc... there is no single *.zip file. So this is probably a non standard feature of 7zip itself.

Cheers.
Kent

ferdinand

@kbar said in ZipFile extracting spanned files:

@zipit You mentioned that splitting would create a dedicated *.zip file. This is definitely NOT the case with 7zip. The files all start as *.zip.001, *.zip.002, *.zip.003 etc... there is no single *.zip file. So this is probably a non standard feature of 7zip itself.

Hi,

first of all, as already stated, neither splitting nor spanning are standardized by the ISO norm for document containers[1]. The norm actually makes a point of it to stress that these forms are not supported.

So every tool does what it wants in the first place. Because of that I probably should have written in most cases. But as stated above and also in my code, there is a third form, 7zip calls these Volumes.

We haven't dealt with this form in university, so I cannot say much about it, but I suspect that it has a better error correction as explained above. If you loose a file in the classic split scheme we were taught, you are basically screwed. The volume form also comes without the dedicated zip files. There are probably also other tools that can write this form, but they might using a different naming scheme. I at least once encountered such volume form that didn't use 7zips naming scheme and it tripped Pythons zipfile.

Probably all this is more on the irrelevant side of things, but I felt that with my forum alias I had to chime in

Cheers,
zipit

[1] https://www.iso.org/standard/60101.html

kbar

@zipit Really great insights.Totally appreciate all your comments. I have worked with zlib for many years and integrated it into many different platforms, apps and plugins. This is just the first time I have ever looked into splitting or spanning features.

Off topic: Would be great to work on compression algorithms again. I keep eyeing up all the image compression formats being used and developed these days. Fun stuff.