Thinking Particles in ParallelFor

neon

Hello PluginCafe!
I am currently working on an Object Modifier plugin and wanted to integrate Thinking Particles alongside the deformation.
The deformation of the points is done in a ParallelFor loop (which works fine and is needed) but even just trying to allocate a particle inside of the worker leads to access violations and a Cinema4D crash.
However if I loop through the pcnt again outside/after the initial deformation (after the parallel for) everything works as expected.

So my assumption is that Thinking Particles don't really like ParallelFor/multiple threads?
Is this a general limitation of Thinking Particles or is there some threading issue that I have missed in the documents/am not aware of?

If possible I would like to move all the TP-Allocation stuff into the parallel for as I have to (in the context of this plugin) do most calculations twice otherwise.
The current target version I am developing this plugin for is R20 (and all versions after).

Thanks in advance!
-Florian

ferdinand

Hello @neon,

thank you for reaching out to us. I assume for your case that:

You are inside ObjectData::ModifyObject() or ::ModifyParticles() (the latter is only for standard particles and has no special meaning for tp particles).
'trying to allocate a particle' means calling TP_MasterSystem::AllocParticle() or ::AllocParticles()

Furthermore a few facts:

Thinking Particles is other than the standard particle system inherently single-threaded, but that does not necessarily mean that you cannot access its data in a multi-threaded fashion.
Threading restrictions apply to all threads in Cinema 4D that are not the main thread. In most ObjectData methods, e.g., ::ModifyObject, you are not on the main thread. These restrictions all revolve around not allocating new elements in a scene, as this can nullify assumptions made by other threads and then cause Cinema 4D to crash. The common mistake is here for example adding an object to the document inside ObjectData::GetVirtualObjects. Note that violating this rule might not crash every time on each scene and each machine, but it is still not allowed. This also applies to allocating TP particles but is likely not the cause of the crashes you experience.
Resizing a collection or other data structures asynchronously without boundaries as semaphores or locks is a bad idea in general, i.e., what you are doing when allocating particles in your worker loop object.

About your specific problem:

You unfortunately did not provide any code, which forces us to guess. Allocating the particles inside a ParallelFor is not something I would try, as it is likely impossible; or when it is possible it will slow down the loop as the workers have then to wait for access to the shared data structure. Allocating n particles beforehand you want to add and then setting their state in a ParallelFor could work, but I have not tried myself. But this then still leaves you with the fact that adding particles to a TP system outside of the main thread is unsafe, regardless of if inside a ParallelFor or not. When it is not trivial to determine the number of particles which must be added, e.g., some kind of branching, then you must divide your task into subtasks where for each sequentially executed subtask the number of new particles is known. Alternatively, you can allocate more particles than you can possibly require and then deallocate the overhead at the end.

But I would point out again that allocating particles is not something which should be done in an ObjectData plugin in general. You can find places as Message where it can be done safely as that method usually does run on the main thread (but you have still to check), but it still will go against the purpose of the type.

If this answer does not solve your problem, I would ask you to provide executable code for follow-up questions, as it is otherwise hard to give good answers.

Cheers,
Ferdinand

neon

Hello @ferdinand,
thank you for you reply!
I am sorry, I should have been more specific.

I am currently doing everything in ObjectData::ModifyObject() and was (am) using TP_MasterSystem::AllocParticle().
The (maximum) number of particles I would allocate is always bound by the point count of the object I am modifying.

The simplyfied version of my ParallelFor routine in ModifyObject is like this:

	Matrix m;
	m = (~mod_mg) * op_mg;
	auto worker = [.../*There would normally be more here*/](maxon::Int i)
		{
			Vector  p;
			
			//holds information about the sampled result
			sample_t result;
			
			p = m * m_padr[i];
			
			//these calculations are expensive, thats why the parallel for in the first place
			result = sampler->doSomeCalculations(p.x, p.z);

			p += result.deformVector;

			m_padr[i] = ~m * p;
			}
		};
	maxon::ParallelFor::Dynamic(0, m_pcnt, worker);

where this sample_t result would also hold information (for that point) if it should spawn a particle for it, its lifetime/velocity etc. are derived from that as well.
And currently right after this ParallelFor I have this (also simplified):

       Vector  p;
	Float32 particleValue;
	for (int i = 0; i < m_pcnt; i++)
	{

		p = m * m_padr[i];
		result = sampler->doSomeCalculations(p.x, p.z);
		particleValue = 1 - result.particleValue;
		
		if (particleValue > 0) //should a particle even be spawned
			if (i % m_particleReduction == 0)  //just some simple reduction for viewport speed
			{
				if (m_masterSystem)
				{
					Int32 particle = m_masterSystem->AllocParticle();
					if (particle != NOTOK)
					{
						m_masterSystem->SetLife(particle, ...);
						m_masterSystem->SetColor(particle, ...);
						m_masterSystem->SetPosition(particle, op_mg * m_padr[i]);
						m_masterSystem->SetVelocity(particle, ...);
						
						if(m_ioParticleGroup)
							m_masterSystem->SetGroup(particle, m_ioParticleGroup);
					}
				}
			}
		}

Since doing TP Allocation in ObjectData/ModifyObject is not something I should do, where else, appart from maybe Message would be a place for that? The only information for pre-allocation that I would need is the point count of the object to modify.
Or should I scrap the idea of generating the particles on my own alltogether? The plugin itself should later on be used in a more dynamic (mesh - subdivisions etc.) context that's why I initially did not want to simply set vertex weights as that would not quite work for the use case of this plugin.
Hope I could give more useful information on what it is I am trying to achieve.

Thanks again for your quick and detailed reply!
Best Regards,
Florian

ferdinand

Hey @neon,

thank you for the update and the simplified code. I assume from the context that these code snippets represent a state which does not crash anymore since the line Int32 particle = m_masterSystem->AllocParticle(); is not inside the worker loop anymore. And the major problem is basically now that you have to repeat in the second snippet result = sampler->doSomeCalculations(p.x, p.z);, your expensive call, which you have already done in the first async snippet for the vertices of the object (or whatever m_padr is for). If this code is still crashing, then the main-thread thing is the cause, and you can jump to the paragraph Main Thread Problem.

As lined out in the previous posting, my first idea would be to invert what you are doing, so that you can also modify existing particles inside your worker lambda. But the problem seems to be, at least judging by the mock code, that you really cannot predict the number of particles you will need, as the existence of each particle is tied to the outcome of the expensive method doSomeCalculations(). So, the only route would be then to allocate the maximum number of particles which could be required, which seems to be m_pcnt in your case. As a mock-algorithm:

Allocate m_pcnt tp particles and store their particle ids in an array particleCollection.
Launch your async lambda worker and include particleCollection and the tp master system in its capture.
a. Invoke result = sampler->doSomeCalculations() in the lambda as you do now.
b. The argument i passed to your lambda should be equivalent to a particle index in particleCollection, i.e., you can also index particleCollection with i.
c. Determine with result if a particle is required or not as you already do in your second snippet.
d. For particles which should not exist, i.e., which should be deallocated, set the life attribute to a negative value as lined out as the preferred method of TP particle deallocation in TP_MasterSystem ::FreeParticle().
e. For all other particles carry out what you did in your second snippet for valid particles (set life, color, pos, etc.).

This would call doSomeCalculations() only once at the cost of some memory overhead, as you will generate a particle overhead in some or most cases, depending on the likelihood of 1 - sampler->doSomeCalculations().particleValue > 0.

Main Thread Problem

About the secondary problem of not adding scene elements outside of the main thread: There is no easy fix for that. ObjectData and the specific method you are using are not intended for doing what you want to do, allocating particles. Over the years, developers have worked around these problems, but these workarounds often entail a lot of work.

The principle logic is always:

I want to add elements to a scene in some NodeData method which is not on the main thread.
Collect all data required for that action and store it as an object task. In your case task could be a dictionary of the particle data which should be added.
The next time NodeData::Message is called and is on the main thread (will likely be the case, but you must still test for it with GeIsMainThread()), carry out these changes with the help of task, e.g., allocate and set the particles.
Remove task.

The problem is that there is no dedicated mechanism or message id for that approach. So, you just check every message if there is a task and then carry it out. Which not only slows down the execution of Message, which is not good when the slowdown is substantial as everything else is waiting for you, but also comes with no guarantee of time of execution. Message will inevitably be called at some point after you created a task, but it could take seconds before the next message comes in. Which mainly happens when a node (object, tag, material, etc.) is not active (selected), as GUI/parameter stuff is the most frequent reason for messages.

For adding a bunch of materials to a scene from some ObjectData this is all manageable, but for particles the time of allocation matters. I do not see an effective way out of this dilemma. You could allocate the particles ahead of time, i.e., make sure there is always m_pcnt of invisible particles available for the next call to ModifyObject. But this then gets all complicated. There is also the problem that tying particle generation to ModifyObject() in itself is problematic, since there is nothing which prevents this method from being called multiple times per frame.

For completeness: There is maxon::ExecuteOnMainThread() which allows you to push a lambda to the main thread, which you could invoke in your MyObjectData::ModifyObject() and sort of solve all these problems. But it effectively just inverts the problem, as then ::ModifyObject() will wait for the ExecuteOnMainThread() job to be executed on the main thread, i.e., you will bind a method which is for performance reasons not on the main thread, ModifyObject, to the main thread.

Cheers,
Ferdinand

neon

Hello @ferdinand,
thank you again for this very detailed and helpful reply!

I tried both methods, allocating beforehand and like the previous setup, just have it after the ParallelFor altogether.
Both worked, although allocating beforehand introduced some weird flickering in the particles and was (at least it feels that way) more unstable.
Which is why I opted to go for the second solution as with some exstensive testing it didn't seem to cause any issues.
For anyone maybe reading this in the future and having the same issues:
Do not allocate the particle group inside the plugin but use a link field and have the user handle the particle group creation inside c4d, that solved a lot of weird behaviours and seemed to work better/more stable. (at least in my not so intended usecase)

I will mark this thread as solved, as it works now and I can fix the main-thread issue as well without too much trouble.

So thank you very much for your quick and very detailed replies!
Best Regards,
Florian