MoData.GetData()?

kvb · Jun 8, 2020, 7:19 PM

MoData.GetData(self, id=NOTOK)
Get a copy of the array’s container.

Cool... how do I use the basecontainer? I can't seem to access any data from it? My goal is to store snapshots of mograph clone arrays for later use. I'm not even sure if this is the correct approach, but it seems convenient to to be able to just save a single basecontainer instead of a series of arrays that might be cumbersome to try and stash away.

I know that using baseconainers to store/retrieve large datasets isn't very efficient (i.e. don't use it like it's an array). I intend to use Read/Write/CopyTo, since I might be attempting to store entire mograph animations, and I gather writing successive frames as single basecontainers to a hyperfile is better than saving each one as a subcontainer in a basecontainer.

Thanks in advance!
Kevin

ferdinand · Jun 9, 2020, 7:57 AM

Hi,

the correct method to access the individual arrays of a MoData object would be GetArray. Assuming that is what you are trying to do. You will also have to specify the array type you want to access, MODATA_MATRIX for the particle matrices for example.

If you want to keep a copy of a MoGraph state, I would clone the generator in Python (unfortunately we cannot instantiate MoData in Python directly). You could also do this manually, but you should keep in mind that most particle data is presented as mutable objects, so you would have to explicitly clone them or otherwise you will just reference them.

import c4d

def main():
    """
    """
    # Get the MoData for the selected BaseObject, if there is either
    # no selection or it does not yield some MoData, get out.
    if not isinstance(op, c4d.BaseObject):
        return

    modata = c4d.modules.mograph.GeGetMoData(op)
    if modata is None:
        return

    # Iterate over the matrices of our MoData. If you want other data,
    # e.g. colors, weights, etc., you will have to iterate over these
    # by using their respective symbol. Passing NOTOK will yield no data,
    # I am not sure why MAXON did make this the method's default argument.
    for matrix in modata.GetArray(c4d.MODATA_MATRIX):
        print matrix

    # We could cache the array's of our MoData individually, but an easier
    # approach in Python would be just to clone the generator which hosts
    # the MoData.

    # This would be the object op in this example.
    generator = modata.GetGenerator()
    # Clone that object.
    cache = generator.GetClone()
    # So that we can later on access its data.
    cached_data = c4d.modules.mograph.GeGetMoData(cache)
    print cached_data

if __name__=='__main__':
    main()

On a side note: I have not done any extensive tests on the performance of BaseContainer, but they are just integer key hash maps that allow for the dynamic typing of their values. And hash maps are very efficient for larger data sets, especially when it comes to access, which is probably why MAXON used them as a basis for basically everything in Cinema.

Cheers,
zipit

Manuel · Jun 10, 2020, 6:47 AM

hi,

After asking the dev, this is used internally and should be marked as private. Same goes for all modata functions that return a BaseContainer.
(GetDataIndexInstance, GetDataInstance)

Cheers,
Manuel

kvb · Jun 11, 2020, 3:23 AM

@zipit said in MoData.GetData()?:

Hi,

the correct method to access the individual arrays of a MoData object would be GetArray. Assuming that is what you are trying to do. You will also have to specify the array type you want to access, MODATA_MATRIX for the particle matrices for example.

If you want to keep a copy of a MoGraph state, I would clone the generator in Python (unfortunately we cannot instantiate MoData in Python directly). You could also do this manually, but you should keep in mind that most particle data is presented as mutable objects, so you would have to explicitly clone them or otherwise you will just reference them.
import c4d

def main():
    """
    """
    # Get the MoData for the selected BaseObject, if there is either
    # no selection or it does not yield some MoData, get out.
    if not isinstance(op, c4d.BaseObject):
        return

    modata = c4d.modules.mograph.GeGetMoData(op)
    if modata is None:
        return

    # Iterate over the matrices of our MoData. If you want other data,
    # e.g. colors, weights, etc., you will have to iterate over these
    # by using their respective symbol. Passing NOTOK will yield no data,
    # I am not sure why MAXON did make this the method's default argument.
    for matrix in modata.GetArray(c4d.MODATA_MATRIX):
        print matrix

    # We could cache the array's of our MoData individually, but an easier
    # approach in Python would be just to clone the generator which hosts
    # the MoData.

    # This would be the object op in this example.
    generator = modata.GetGenerator()
    # Clone that object.
    cache = generator.GetClone()
    # So that we can later on access its data.
    cached_data = c4d.modules.mograph.GeGetMoData(cache)
    print cached_data

if __name__=='__main__':
    main()
On a side note: I have not done any extensive tests on the performance of BaseContainer, but they are just integer key hash maps that allow for the dynamic typing of their values. And hash maps are very efficient for larger data sets, especially when it comes to access, which is probably why MAXON used them as a basis for basically everything in Cinema.

Cheers,
zipit

Thanks @zipit, I'm aware of the usual way of getting and setting mograph data. Maxon's gone out of their way to consolidate these arrays in manageable number of basecontainers and I want to take advantage of that.

I'm not sure about storing an entire clone of the generator itself would be the most efficient method, and I'm trying to be as efficient as possible, but I don't know, to be honest. If I can get it down to as simple a method of storing away just the modata, then I'll be in good shape. But it's certainly more concise than breaking down the data into it's component parts and trying to stash it away in a basecontainer or hyperfile. Certainly worth a look-see:)

Regarding basecontainer efficiency, I've actually been hearing the opposite in a few posts, that it wasn't suited for large data sets. Now, this may have been expressed in a general sense as, without a perfect hash function, they can get very slow when the entry count gets very large. In the end it depends on the hash function. Wikipedia describes a few different perfect hash approaches, so Maxon is probably employing one of those. I say all that as a simple superuser with an internet connection and a penchant for getting in over my head, not a developer... that's about as deep as my knowledge on hash mapping goes haha! But ultimately, I don't want to rely on a possibility of reading/writing hundreds of thousands of individual basecontainer entries.

@m_magalhaes said in MoData.GetData()?:

hi,

After asking the dev, this is used internally and should be marked as private. Same goes for all modata functions that return a BaseContainer.
(GetDataIndexInstance, GetDataInstance)

Cheers,
Manuel

Thanks, Manuel. Private, eh? Which would mean I need access to the SetData() function of Modata, which isn't available in python. This just so happens to have become a C++ plugin, which has access to that function. Would I be correct in assuming that I can store these containers in my own basecontainer, then retrieve them later by passing them into a modata?

Thanks!
Kevin

ferdinand · Jun 11, 2020, 8:47 AM

Hi,

@kvb said in MoData.GetData()?:

I'm not sure about storing an entire clone of the generator itself would be the most efficient method, and I'm trying to be as efficient as possible, but I don't know, to be honest. If I can get it down to as simple a method of storing away just the modata, then I'll be in good shape. But it's certainly more concise than breaking down the data into it's component parts and trying to stash it away in a basecontainer or hyperfile. Certainly worth a look-see:)

I don't think that cloning the generator will result in any noticeable performance loss unless you are planning to do this with thousands of generators at once. But there is certainly nothing wrong with just caching the data you need. I was just following the path of least resistance, which usually is a good thing

Regarding basecontainer efficiency, I've actually been hearing the opposite in a few posts, that it wasn't suited for large data sets. Now, this may have been expressed in a general sense as, without a perfect hash function, they can get very slow when the entry count gets very large. In the end it depends on the hash function. Wikipedia describes a few different perfect hash approaches, so Maxon is probably employing one of those. I say all that as a simple superuser with an internet connection and a penchant for getting in over my head, not a developer... that's about as deep as my knowledge on hash mapping goes haha! But ultimately, I don't want to rely on a possibility of reading/writing hundreds of thousands of individual basecontainer entries.

I am not quite sure where you red this and what you would consider to be a large data set. Unless you are going into the territory of "it doesn't fit into memory anymore", hash maps are probably (one of) the best data structure(s) that you can use for key-value pairs. How BaseContainer has been designed internally, is of very little consequence, at least in the bigger picture, as I was just talking about the general time complexity of the access operations on this data type. You should also note that there are no arrays in CPython without any third party modules, only lists. Hash maps will beat the other collection types in most access scenarios, you read here more about the time complexity of Python's builtin collection types.

You can also just roughly test it yourself, as I have it done here.

import c4d
import time

def timeit(function):
    """A decorator to time a function.

    Not really something reliable, but good enough for this IMHO.
    """
    def wrapper(*args, **kwargs):
        """The wrapper logic.
        """
        t0 = time.time()
        function(*args, **kwargs)
        return round(time.time() - t0, 5)
    return wrapper

@timeit
def read(obj):
    """Read all items in the collection object. 
    """
    for i in range(len(obj) - 1):
        obj[i]

@timeit
def insert(obj, count):
    """Insert count items to the collection object.
    """
    # Type testing is relatively expensive, so we factor this out.
    if isinstance(obj, list):
        for i in range(count):
            # This is also not really a fair comparison, inserting items
            # would also be O(n), but since appending is usually enough,
            # I did choose this.
            obj.append(i)
    else:
        for i in range(count):
            obj[i] = i

@timeit
def delete(obj):
    """Delete all items from the collection object.
    """
    indices = range(len(obj) - 1)
    # Type testing is relatively expensive, so we factor this out.
    if isinstance(obj, list):
        for item in indices:
            # These two operations would be O(n) each:
            # obj.remove(item)
            # obj.pop(0) - Or any index other than -1

            # Due to the fact that this would take forever, we will just pop
            # the last item which for lists is also O(1).
            # This is not really a fair comparison to a hash map, as they can
            # delete anything at O(1). But they will be more performant 
            # anyways, so there is no point in wasting our time.
            # PYTHON LISTS ARE REALLY BAD AT DELETE ACCESS!
            obj.pop()
    elif isinstance(obj, dict):
        for key in indices:
            del(obj[key])
    else:
        for key in indices:
            obj.RemoveData(key)

@timeit
def iterate(obj):
    """Iterate over all items in the collection object.
    """
    for item in obj:
        pass

def run_suite(count):
    """Runs a read, write, delete and iteration test suite for count elements on all data types.
    
    Tests the three data types list, dict and c4d.BaseContainer.
    
    Args:
        count (int): The number of elements for the collection type.
    
    Returns:
        dict: The results for the access timings.
    """
    results = {"count": count,
               "types": {}}
    for dtype in [list, dict, c4d.BaseContainer]:
        obj = dtype()
        w, r, i, d = insert(obj, count), read(obj), iterate(obj), delete(obj)
        results["types"][dtype.__name__] = [r, w, d, i]
    return results

def print_table(data):
    """Pretty prints the results as a table.
    """
    line = "{:<15}| {:<15}| {:<15}| {:<15}| {:<15}"
    header = line.format("type", "read", "insert", "delete", "iterate")
    seperator = "-" * len(header)

    print header
    for suite in data:
        print seperator
        print "@{:,} items".format(suite["count"])
        print seperator
        for dtype in sorted(suite["types"]):
            msg = line.format(dtype, *suite["types"][dtype])
            print msg

    print "\nAll timings are in seconds."

def main():
    """Entry point.

    Runs a test suite for 1e5, 1e6 and 1e7 items for the three tested data
    types and pretty prints the results.
    """
    timings = []
    for count in [1e5, 1e6, 1e7]:
        results = run_suite(int(count))
        timings.append(results)
    print_table(timings)

if __name__ == "__main__":
    main()

type           | read           | insert         | delete         | iterate        
-----------------------------------------------------------------------------------
@100,000 items
-----------------------------------------------------------------------------------
BaseContainer  | 0.033          | 0.045          | 0.028          | 0.034          
dict           | 0.008          | 0.012          | 0.007          | 0.002          
list           | 0.007          | 0.014          | 0.016          | 0.001          
-----------------------------------------------------------------------------------
@1,000,000 items
-----------------------------------------------------------------------------------
BaseContainer  | 0.361          | 0.477          | 0.29           | 0.264          
dict           | 0.088          | 0.129          | 0.094          | 0.019          
list           | 0.063          | 0.143          | 0.144          | 0.013          
-----------------------------------------------------------------------------------
@10,000,000 items
-----------------------------------------------------------------------------------
BaseContainer  | 4.041          | 5.759          | 3.565          | 2.669          
dict           | 0.971          | 1.318          | 0.858          | 0.157          
list           | 0.685          | 1.416          | 1.403          | 0.128          

All timings are in seconds.
[Finished in 44.9s]

I do not know what you are planning to do, but I cannot see a scenario where BaseContainer would be too slow. At least in Python, which is just one giant bottle neck in itself.

Cheers,
zipit

kvb · Jun 11, 2020, 7:33 AM

Wow... well it's hard to argue with those results! Looking back at the posts that led me down this rabbit hole it seems the info may have been outdated and/or was based on casual assumptions.
Looking again at the research that followed those finds I can see how I may have fallen into the trap of looking to confirm those results instead of actually expanding my understanding. Thanks for the incredibly enlightening post!

As far as what I'm trying to do, let's just say I need mograph cache functionality but can't rely on the mograph cache tag itself;) Luckily I'll no longer have the bottleneck of python... and nothing wrong with considering the path of least resistance (will likely be my initial test case, actually).

Thanks again!
Kevin

Manuel · Jun 11, 2020, 7:49 AM

hi,

thanks a lot @zipit for your time.

About the BaseContainer functions, on C++ that's the same story. Those functions are/were used internally to send data to other part of Cinema 4D (like dynamics for example)

All will be marked as private as it's kind of useless for 3rd party developers.

Cheers,
Manuel.

ferdinand · Jun 11, 2020, 8:45 AM

Hi,

I was a bit overworked when I wrote this test, which is why I did made a little booboo in the code above with rather dire consequences. I have fixed this now and the timings are now correct (aside from the general diceyness of the whole test). But the general consensus is the same. BaseContainer is not slow and a good choice when we need key, value pairs. One should also keep in mind, that I did employ simplifications for list both in the insert and delete case, if we want to truly delete and insert arbitrary elements from/to a list, this type is terrible, as it has to rebuild all data each time.

Cheers,
zipit

kvb · Jun 11, 2020, 5:46 PM

@m_magalhaes said in MoData.GetData()?:

hi,

thanks a lot @zipit for your time.

About the BaseContainer functions, on C++ that's the same story. Those functions are/were used internally to send data to other part of Cinema 4D (like dynamics for example)

All will be marked as private as it's kind of useless for 3rd party developers.

Cheers,
Manuel.

My god, did I really just space out on what private means... ignore me, I'm an idiot lol. That is a shame though, because those functions sound like they do exactly what I'm trying to do. Neither the Modata datatype nor the data arrays themselves are supported directly by either basecontainer or hyperfile. It seems there are really only two options then:

Break them down into their individual elements and build them back up. Or deal with Get/SetMemory(), which I'd rather avoid... unless that's, I don't know, a good idea maybe?
Store away a clone of the mograph generator, but that's gonna have a bunch of extraneous data that I don't need, which I'd like to avoid... unless... the MoData tag itself? Yes, that might work! How could I forget that the MoData is stored on a hidden tag designed specifically for that purpose?!? That fact literally inspired my plugin's design!

Apparently my brain does not have a perfect hash function and suffers from a terribly high load factor, haha!

Thanks!
Kevin

Manuel · Jun 15, 2020, 6:52 AM

hi,

it's a bit hard to tell as i still don't understood what you were trying to achieve.
Do you want to save the mograph data and use them in c4d as a kind of library, or do you want to export them to a 3rd Party software ?

Cheers,
Manuel

kvb · Jun 19, 2020, 6:35 AM

Oh, man... I thought I hit reply on this days ago! Sorry!

So, MoData tag container didn't work, that's not where the MoData is stored. Well, there seems to be MoData there, just always empty... but I think that's because the mograph stuff uses the messaging system so heavily that just grabbing a basecontainer doesn't cut it.

@zipit said in MoData.GetData()?:

Hi,

I was a bit overworked when I wrote this test, which is why I did made a little booboo in the code above with rather dire consequences. I have fixed this now and the timings are now correct (aside from the general diceyness of the whole test). But the general consensus is the same. BaseContainer is not slow and a good choice when we need key, value pairs. One should also keep in mind, that I did employ simplifications for list both in the insert and delete case, if we want to truly delete and insert arbitrary elements from/to a list, this type is terrible, as it has to rebuild all data each time.

Cheers,
zipit

Thanks for revising this. Even if the basecontainer times technically look worse by comparison, in context I can still see it's no slouch! I've moved forward with this approach and so far I'm getting great performance with breaking down and stashing away the array's data into basecontainers. Still wondering if it wouldn't be better to go with a Read/Write/CopyTo implementation, but I honestly don't see a need here. It's plenty fast enough and doesn't even need to be accessed often.

@m_magalhaes said in MoData.GetData()?:

hi,

it's a bit hard to tell as i still don't understood what you were trying to achieve.
Do you want to save the mograph data and use them in c4d as a kind of library, or do you want to export them to a 3rd Party software ?

Cheers,
Manuel

I need mograph cache tag functionality without the mograph cache tag. So I need to store MoData and be able to have it persist through file saves/loads, copying to a new document/rendering, duplication, etc. Given the potential size of the data in question (especially once you add animation into the mix) I want to make sure I'm doing this in the most efficient manner possible.
Given that all the datatypes that contain the MoData aren't directly supported by the normal storage methods in c4d (basecontainers and hyperfiles), I was concerned that breaking all the data into its individual components to store it away wouldn't be efficient. I'm seeing now that I had little to be concerned about. Performance is great so far!

I think I've even cleared my final hurdle, which was creating an array of MoDatas. My problem there was trying to use a BaseArray when what I needed was a PointerArray. Since PointerArray's take ownership of the pointed object, freeing the allocated MoDatas is as clean and simple as calling Reset() on the Pointer Array:) Injecting my MoData into the effector pipeline is equally simple by doing a CopyTo() or MergeData()... currently I'm doing that in ModifyPoints, but maybe I should be doing it in InitPoints? I'll figure that part out, but right now I'm just happy to have it all working!

Thank you again zipit and Manuel!