Hey @ferdinand
Thank you so much for another extremely thorough reply!
As you pointed out in your previous post, I had never really tested with normal playback. In real‑world usage I guess the artists never noticed the issue because—on a 20‑minute+ scenography—they eventually stop playing the timeline in real time and just jump to key moments or scrub quickly.
I’m aware that our MoGraph usage is a bit “extreme”, but I’m relieved a bug has been identified; it matches the artists’ feeling that “it didn’t behave like this before.”
That said, drone shows have grown in duration, drone count, and choreography complexity in recent years, so that is probably a big factor too.
Thanks for the practical tips for the artists; I have a meeting with them later today and will brief them on the “double use of effectors in different groups” pitfall you mentioned.
Regarding the solution: we had looked into that approach before, but apparently not correctly. Retesting with your Python code gave us significant performance gains—many thanks!
Note: I’m aware of the limits of parallelisation, but for now that’s not an issue: every scenography is processed the same way and we don’t have any simulations. Still, I’ll keep a single‑threaded fallback in mind in case we need it later. (Our artists always have new ideas
)
Besides that, I have a small question, this morning I ran more tests and tried using a clone of the document inside each thread [1] instead of re‑loading the document [3]. On a heavy scenography the load/destroy step takes a noticeable chunk of time, whereas cloning gives even better performance [2] vs. the reload approach [4].
Is there any downside to the usage of document cloning—e.g. hidden memory cost or something else I should watch out for?
Thanks again for your time and the deep analysis; I’m always blown away by the quality of support here.
Looking forward for the 2026!
Attachments
[1] Code using LoadDocuments (little update of your script to handle a pool of thread in whole scene duration)
"""Builds the passes for all frames of a document in chunks in parallel.
"""
import c4d
import mxutils
import time
IS_DEBUG: bool = True
class PassesThread (c4d.threading.C4DThread):
"""Executes the passes for a given document in a given frame range.
"""
def __init__(self, file: str, fps: int, minFrame: int, maxFrame: int) -> None:
"""Initializes the thread with the document and frame range to process.
"""
self._file: str = mxutils.CheckType(file, str)
self._fps: int = mxutils.CheckType(fps, int)
self._minFrame: int = mxutils.CheckType(minFrame, int)
self._maxFrame: int = mxutils.CheckType(maxFrame, int)
self._result: bool = False
# Start the thread once it has been instantiated. We could also do this from the outside,
# and sometimes that is preferable, but I went with this option here.
self.Start()
print(f"PassesThread initialized for file {self._file} from frame {self._minFrame} to {self._maxFrame}.")
@mxutils.TIMEIT(enabled=IS_DEBUG)
def Main(self) -> None:
"""Called by Cinema 4D as the payload-method of the thread when it is started.
We execute the passes for the given document in the given frame range and also load and
unload the document for the chunk we process. This is necessary to avoid MoGraph issue and
also to parallelize the execution of the passes (each pass needs its own document so that
we can work on different frame ranges in parallel).
"""
doc: c4d.documents.BaseDocument = c4d.documents.LoadDocument(
self._file, c4d.SCENEFILTER_OBJECTS | c4d.SCENEFILTER_MATERIALS)
for f in range(self._minFrame, self._maxFrame + 1):
doc.SetTime(c4d.BaseTime(f, self._fps))
if not doc.ExecutePasses(self.Get(), True, True, True, c4d.BUILDFLAGS_EXPORTONLY):
break
# I did not trust my own code here, :D, as the speed ups seemed to be too good to be true.
# So, I checked if at least the last frame built everything correctly (to some extent). With
# 10 threads my CPU was at ~90% all the time and begging for mercy :D
# didBuildCaches: bool = True
# for obj in mxutils.IterateTree(doc.GetFirstObject()):
# # We are ignoring deform caches as it would be more complicated to figure out if
# # something should have a deform cache or not. But we can check all generators for
# # having a cache. This of course does not check caches in caches (we would need
# # mxutils.RecurseGraph for that), but this is good enough as a sanity check.
# if obj.GetInfo() & c4d.OBJECT_GENERATOR and not obj.GetCache():
# didBuildCaches = False
# print(f"Did build caches: {didBuildCaches}")
c4d.documents.KillDocument(doc)
self._result = True
@property
def DidSucceed(self) -> bool:
"""Returns whether the passes were executed successfully.
"""
return self._result
@mxutils.TIMEIT(enabled=IS_DEBUG)
def run() -> None: # Named run so that we can more easily distinguish it PassesThread.Main.
"""
"""
file: str = c4d.storage.LoadDialog(c4d.FILESELECTTYPE_SCENES, "Select a scene file", c4d.FILESELECT_LOAD)
if not file:
print("No file selected.")
return
# Get the documents animation data to determine the frame range.
doc: c4d.documents.BaseDocument = c4d.documents.LoadDocument(file, c4d.SCENEFILTER_NONE)
fps: int = doc.GetFps()
minFrame: int = doc[c4d.DOCUMENT_MINTIME].GetFrame(fps)
maxFrame: int = doc[c4d.DOCUMENT_MAXTIME].GetFrame(fps)
# Split up frames into chunks of X frames to avoid memory issues with very large scenes. There
# is currently a bug in MoGraph scenes that causes slow-downs the longer they get. In the usual
# frame range of a few hundred frames, this is not very noticeable, but it becomes noticeable
# for thousands or even tens of thousands of frames. So, we split things into chunk and unload
# the document in between. The chunk size can be adjusted to taste, from the very brief tests
# I did, 1000 frames seems to be a good value, after that things start to slow down.
#
# We created a fix for this MoGraph bug and it will likely be contained in a future release of
# Cinema 4D. But as always, we cannot guarantee that this will be delivered at all or when it
# will be delivered, as things can always change last minute.
chunkSize: int = 1000
chunks: list[tuple[int, int]] = [
(minFrame + i * chunkSize, min(maxFrame, minFrame + (i + 1) * chunkSize - 1))
for i in range((maxFrame - minFrame + chunkSize) // chunkSize)
]
# Thread pool of 10 threads
maxConcurrentThreads: int = 10
totalChunks: int = len(chunks)
completedChunks: int = 0
chunksToProcess: list[tuple[int, int]] = chunks.copy()
# I did also sketch out the threading subject we already talked about. As mentioned, Cinema 4D
# will parallelize scene execution on its own. We cannot make one pass go faster. But your scene
# is relatively lightweight per frame, so I end up only using about 20% of my CPU when I run
# this frame by frame (as there is nothing left to optimize for Cinema 4D in that one frame).
#
# What is special about your scene, is that each frame is independent of the other frames. I.e.,
# you can just jump to a frame X and execute its passes and get the correct result. That would
# for example not work for a scene which contains a (non-cached) simulation, e.g., a cloth or
# pyro simulation. In that case, you would have to run the passes for all frames up to
# frame X to get the correct result for frame X.
# So, what we are doing is parallelize the execution in its width, we are working on multiple
# chunks of frame ranges at the same time. This is not a speed-up for one frame, but it is a
# speed-up for the whole scene.
# Create threads for each chunk and directly start them. Then loop over all the threads,
# removing them one by one as they finish, until all threads are done. Each frame chunk will
# be processed in its own thread in parallel to the other chunks.
activeThreads: list[PassesThread] = []
print(f"Total chunks to process: {totalChunks}")
for i in range(min(maxConcurrentThreads, len(chunksToProcess))):
startFrame, endFrame = chunksToProcess.pop(0)
thread = PassesThread(file, fps, startFrame, endFrame)
activeThreads.append(thread)
while activeThreads or chunksToProcess:
# Update status
totalFrames = maxFrame - minFrame + 1
processedFrames = completedChunks * chunkSize
c4d.gui.StatusSetText(f"Handling scene chunks ({totalFrames} Frames) ({completedChunks}/{totalChunks})")
# Check if thread finished
finishedThreads = []
for thread in activeThreads:
if not thread.IsRunning():
if thread.DidSucceed:
completedChunks += 1
print(f"Chunk completed successfully. Progress: {completedChunks}/{totalChunks}")
else:
print(f"Chunk failed. Progress: {completedChunks}/{totalChunks}")
finishedThreads.append(thread)
# Remove finished threads
for thread in finishedThreads:
activeThreads.remove(thread)
# Starts new thread to fill pool if scene is not ended
while len(activeThreads) < maxConcurrentThreads and chunksToProcess:
startFrame, endFrame = chunksToProcess.pop(0)
thread = PassesThread(file, fps, startFrame, endFrame)
activeThreads.append(thread)
print(f"Started new thread for chunk {startFrame}-{endFrame}. Active threads: {len(activeThreads)}")
time.sleep(1.0)
c4d.gui.StatusSetText(f"Scene processing completed ({totalFrames} Frames) ({totalChunks}/{totalChunks})")
print(f"All chunks processed. Total: {completedChunks}/{totalChunks}")
if __name__ == '__main__':
run()
[2] Performance with Load/Destroy (i stripped some prints)
TIMEIT: Ran 'Main()' in 17.60864 sec
TIMEIT: Ran 'Main()' in 18.19359 sec
TIMEIT: Ran 'Main()' in 19.2093 sec
TIMEIT: Ran 'Main()' in 19.47807 sec
TIMEIT: Ran 'Main()' in 19.86996 sec
TIMEIT: Ran 'Main()' in 20.04131 sec
TIMEIT: Ran 'Main()' in 20.18938 sec
TIMEIT: Ran 'Main()' in 20.5281 sec
TIMEIT: Ran 'Main()' in 20.80194 sec
TIMEIT: Ran 'Main()' in 21.14046 sec
TIMEIT: Ran 'Main()' in 5.43413 sec
TIMEIT: Ran 'Main()' in 6.53501 sec
TIMEIT: Ran 'Main()' in 5.20132 sec
TIMEIT: Ran 'Main()' in 5.41878 sec
TIMEIT: Ran 'Main()' in 5.54765 sec
TIMEIT: Ran 'Main()' in 4.66633 sec
TIMEIT: Ran 'Main()' in 4.71172 sec
TIMEIT: Ran 'Main()' in 4.75155 sec
TIMEIT: Ran 'Main()' in 4.82119 sec
TIMEIT: Ran 'Main()' in 5.14475 sec
TIMEIT: Ran 'Main()' in 5.01931 sec
TIMEIT: Ran 'Main()' in 5.38826 sec
TIMEIT: Ran 'Main()' in 4.73412 sec
TIMEIT: Ran 'Main()' in 4.9887 sec
TIMEIT: Ran 'Main()' in 5.12512 sec
TIMEIT: Ran 'Main()' in 5.30819 sec
TIMEIT: Ran 'Main()' in 5.43618 sec
TIMEIT: Ran 'Main()' in 5.52055 sec
TIMEIT: Ran 'Main()' in 5.71583 sec
TIMEIT: Ran 'Main()' in 4.44371 sec
TIMEIT: Ran 'Main()' in 3.422 sec
TIMEIT: Ran 'Main()' in 2.4513 sec
TIMEIT: Ran 'Main()' in 3.48958 sec
TIMEIT: Ran 'Main()' in 3.50926 sec
TIMEIT: Ran 'Main()' in 3.54951 sec
All chunks processed. Total: 35/35
--------------------------------------------------------------------------------
TIMEIT: Ran 'run()' in 47.51687 sec
[3] Code using doc.GetClone()
"""Builds the passes for all frames of a document in chunks in parallel.
"""
import c4d
import mxutils
import time
IS_DEBUG: bool = True
class PassesThread (c4d.threading.C4DThread):
"""Executes the passes for a given document in a given frame range.
"""
def __init__(self, originalDoc: c4d.documents.BaseDocument, fps: int, minFrame: int, maxFrame: int) -> None:
"""Initializes the thread with the document and frame range to process.
"""
self._originalDoc = mxutils.CheckType(originalDoc, c4d.documents.BaseDocument)
self._fps: int = mxutils.CheckType(fps, int)
self._minFrame: int = mxutils.CheckType(minFrame, int)
self._maxFrame: int = mxutils.CheckType(maxFrame, int)
self._result: bool = False
# Start the thread once it has been instantiated. We could also do this from the outside,
# and sometimes that is preferable, but I went with this option here.
self.Start()
print(f"PassesThread initialized from frame {self._minFrame} to {self._maxFrame}.")
@mxutils.TIMEIT(enabled=IS_DEBUG)
def Main(self) -> None:
doc = self._originalDoc.GetClone(c4d.COPYFLAGS_DOCUMENT)
for f in range(self._minFrame, self._maxFrame + 1):
doc.SetTime(c4d.BaseTime(f, self._fps))
if not doc.ExecutePasses(self.Get(), True, True, True, c4d.BUILDFLAGS_EXPORTONLY):
break
@property
def DidSucceed(self) -> bool:
"""Returns whether the passes were executed successfully.
"""
return self._result
@mxutils.TIMEIT(enabled=IS_DEBUG)
def run() -> None:
doc = c4d.documents.GetActiveDocument()
fps: int = doc.GetFps()
minFrame: int = doc[c4d.DOCUMENT_MINTIME].GetFrame(fps)
maxFrame: int = doc[c4d.DOCUMENT_MAXTIME].GetFrame(fps)
chunkSize: int = 1000
chunks: list[tuple[int, int]] = [
(minFrame + i * chunkSize, min(maxFrame, minFrame + (i + 1) * chunkSize - 1))
for i in range((maxFrame - minFrame + chunkSize) // chunkSize)
]
# Thread pool with maximum 10 concurrent threads
maxConcurrentThreads: int = 10
totalChunks: int = len(chunks)
completedChunks: int = 0
chunksToProcess: list[tuple[int, int]] = chunks.copy()
activeThreads: list[PassesThread] = []
print(f"Total chunks to process: {totalChunks}")
# Start the first threads (up to maxConcurrentThreads)
for i in range(min(maxConcurrentThreads, len(chunksToProcess))):
startFrame, endFrame = chunksToProcess.pop(0)
thread = PassesThread(doc, fps, startFrame, endFrame)
activeThreads.append(thread)
# Main thread pool loop
while activeThreads or chunksToProcess:
# Update status
totalFrames = maxFrame - minFrame + 1
processedFrames = completedChunks * chunkSize
c4d.gui.StatusSetText(f"Handling scene chunks ({totalFrames} Frames) ({completedChunks}/{totalChunks})")
# Check finished threads
finishedThreads = []
for thread in activeThreads:
if not thread.IsRunning():
if thread.DidSucceed:
completedChunks += 1
print(f"Chunk completed successfully. Progress: {completedChunks}/{totalChunks}")
else:
print(f"Chunk failed. Progress: {completedChunks}/{totalChunks}")
finishedThreads.append(thread)
# Remove finished threads from the active list
for thread in finishedThreads:
activeThreads.remove(thread)
# Start new threads if chunks are pending and we have available slots
while len(activeThreads) < maxConcurrentThreads and chunksToProcess:
startFrame, endFrame = chunksToProcess.pop(0)
thread = PassesThread(doc, fps, startFrame, endFrame)
activeThreads.append(thread)
print(f"Started new thread for chunk {startFrame}-{endFrame}. Active threads: {len(activeThreads)}")
# Wait a bit before checking again
time.sleep(1.0)
# Update final status
c4d.gui.StatusSetText(f"Scene processing completed ({totalFrames} Frames) ({totalChunks}/{totalChunks})")
print(f"All chunks processed. Total: {completedChunks}/{totalChunks}")
if __name__ == '__main__':
run()
[4] Performance (Document Cloning) (I stripped many useless print ;))
Total chunks to process: 35
TIMEIT: Ran 'Main()' in 4.73206 sec
TIMEIT: Ran 'Main()' in 4.75516 sec
TIMEIT: Ran 'Main()' in 5.0718 sec
TIMEIT: Ran 'Main()' in 5.36385 sec
TIMEIT: Ran 'Main()' in 5.32922 sec
TIMEIT: Ran 'Main()' in 5.61483 sec
TIMEIT: Ran 'Main()' in 6.64787 sec
TIMEIT: Ran 'Main()' in 6.50201 sec
TIMEIT: Ran 'Main()' in 6.47022 sec
TIMEIT: Ran 'Main()' in 6.47917 sec
TIMEIT: Ran 'Main()' in 4.06786 sec
TIMEIT: Ran 'Main()' in 4.09047 sec
TIMEIT: Ran 'Main()' in 4.2562 sec
TIMEIT: Ran 'Main()' in 3.88825 sec
TIMEIT: Ran 'Main()' in 4.46758 sec
TIMEIT: Ran 'Main()' in 4.5405 sec
TIMEIT: Ran 'Main()' in 4.29954 sec
TIMEIT: Ran 'Main()' in 4.25477 sec
TIMEIT: Ran 'Main()' in 4.371 sec
TIMEIT: Ran 'Main()' in 4.40597 sec
TIMEIT: Ran 'Main()' in 4.34581 sec
TIMEIT: Ran 'Main()' in 4.07137 sec
TIMEIT: Ran 'Main()' in 5.32505 sec
TIMEIT: Ran 'Main()' in 4.6659 sec
TIMEIT: Ran 'Main()' in 5.9585 sec
TIMEIT: Ran 'Main()' in 4.23031 sec
TIMEIT: Ran 'Main()' in 4.31894 sec
TIMEIT: Ran 'Main()' in 6.03186 sec
TIMEIT: Ran 'Main()' in 5.28033 sec
TIMEIT: Ran 'Main()' in 5.33497 sec
TIMEIT: Ran 'Main()' in 3.66333 sec
TIMEIT: Ran 'Main()' in 3.74228 sec
TIMEIT: Ran 'Main()' in 1.93532 sec
TIMEIT: Ran 'Main()' in 2.65944 sec
TIMEIT: Ran 'Main()' in 2.6315 sec
All chunks processed. Total: 35/35
--------------------------------------------------------------------------------
TIMEIT: Ran 'run()' in 20.56978 sec