Maxon Developers Maxon Developers
    • Documentation
      • Cinema 4D Python API
      • Cinema 4D C++ API
      • Cineware API
      • ZBrush Python API
      • ZBrush GoZ API
      • Code Examples on Github
    • Forum
    • Downloads
    • Support
      • Support Procedures
      • Registered Developer Program
      • Plugin IDs
      • Contact Us
    • Categories
      • Overview
      • News & Information
      • Cinema 4D SDK Support
      • Cineware SDK Support
      • ZBrush 4D SDK Support
      • Bugs
      • General Talk
    • Unread
    • Recent
    • Tags
    • Users
    • Login

    Calculate fast polygon global pos.. [SOLVED]

    SDK Help
    0
    11
    969
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • H
      Helper
      last edited by

      On 30/04/2015 at 05:44, xxxxxxxx wrote:

      Hi Eclektrik,

      Possible optimizations:

      1. Call op->GetPointsR() only once, ideally already outside of poly_center()
      2. Call op->GetMg() only once, ideally already outside of poly_center()
      3. Don't use containers at all, make a single Vector called "sum" to which you add either three or
      four times the respective points of the polygon.

      Possible pitfalls in your code:

      1. You don't check if a point index is in the point range, which could lead to an access violation
      2. You use the standard library containers, which use exceptions, which is unsafe in the Cinema
      4D plugin environment as exception handling is disabled by default
      3. You should use the matrix as the left operand in the multiplication. The other way round is
      not valid in maths and deprecated since R15.

      The following code does not multiply by the object's global matrix. Since you should prefer
      to call op->GetMg() only once, you can simply write mg * poly_center(...) instead of poly_center(..., mg).

      Vector poly_center(Int32 pcnt, const Vector* points, const CPolygon& poly)
      {
        Vector sum;
        if (poly.a >= 0 && poly.a < pcnt)
          sum += points[poly.a];
        if (poly.b >= 0 && poly.b < pcnt)
          sum += points[poly.c];
        if (poly.c >= 0 && poly.c < pcnt)
          sum += points[poly.c];
        if (poly.c != poly.d && poly.d >= 0 && poly.d < pcnt) {
          sum += points[poly.d];
          return sum / 4;
        }
        return sum / 3;
      }
      

      Best,
      Niklas

      1 Reply Last reply Reply Quote 0
      • H
        Helper
        last edited by

        On 30/04/2015 at 07:31, xxxxxxxx wrote:

        Hi,

        thanks Niklas for the nice solution (small typo on the second if statement, I think).
        I just want to add two things:
        - CPolygon also has a [] operator, so if you prefer, you could write the above with a loop as well (although Niklas code is probably faster)
        - more important: We strongly recommend to use BaseArray instead of std::vector. Reasons are given by Wilfred for example in this thread: Faster Vector list...

        1 Reply Last reply Reply Quote 0
        • H
          Helper
          last edited by

          On 30/04/2015 at 07:46, xxxxxxxx wrote:

          @Andreas , I expected a for loop may be faster "it may get optimized with SSE by the compiler" , but if both generate the same code non-optimized, then for loop will be slower.

          1 Reply Last reply Reply Quote 0
          • H
            Helper
            last edited by

            On 30/04/2015 at 09:10, xxxxxxxx wrote:

            @Andreas: You're right, I'll fix it 🙂

            1 Reply Last reply Reply Quote 0
            • H
              Helper
              last edited by

              On 01/05/2015 at 07:40, xxxxxxxx wrote:

              I knew i could rely on you guys 🙂  Thanks particularly Niklas for taking the time to go over this..

              Yeah i'm looking at converting everything over to BaseArray as it does seem to be the way to go, flexible and fast ( and obviously compatible ).  Every fraction of a ms i can shave off at this point is a bonus, so i'll be going in pretty hard and learning hopefully how to do things better and more efficiently in general.

              I've implemented multithreading also via C4D's MPTheadPool, but have achieved probably less than 10% of an improvement in a fairly short-run ( but intensive ) deformer plugin.  This is on a hex-core ( 12 threads ) intel and so i'm going to look into what the holdbacks are, ie thread prep/setup times maybe, more optimised code, etc..

              Thanks again.

              1 Reply Last reply Reply Quote 0
              • H
                Helper
                last edited by

                On 01/05/2015 at 10:06, xxxxxxxx wrote:

                Thanks again Niklas just tested and it is a LOT faster..

                Also tried to reduce calls to ->GetMg etc to a single call and reuse the result from those.  Next step is optimising the rest of the code further, and also learning the nuts and bolts of pointers and references as i feel these would also save time and memory, and atm i'm almost sortof guessing how to use them as i pass values into functions and again into more functions..

                1 Reply Last reply Reply Quote 0
                • H
                  Helper
                  last edited by

                  On 01/05/2015 at 11:13, xxxxxxxx wrote:

                  @Eclectrik multi-threading will give you performance, but it may be memory bound at some cases, here I get much more performance if I split memcpy calls across threads "in possible cases".

                  1 Reply Last reply Reply Quote 0
                  • H
                    Helper
                    last edited by

                    On 01/05/2015 at 12:53, xxxxxxxx wrote:

                    Originally posted by xxxxxxxx

                    @Eclectrik multi-threading will give you performance, but it may be memory bound at some cases, here I get much more performance if I split memcpy calls across threads "in possible cases".

                    Cheers Mohamed - what would that involve exactly?

                    At the moment, what i have is for example:

                    Base Control Thread takes PolygonObject, and assigns each Thread with a selection of its polygons to work on.  Then each thread starts and does the above poly_center function on each of its share of polygons.

                    So basically -

                    	Matrix& matr = op->GetMg();
                    	LONG pcnt = op->GetPolygonCount();
                    	const Vector* pnts = op->GetPointR();
                    	const CPolygon* pPoly = op->GetPolygonR();
                      
                    	for (int i=pIndexStart; i<(pLength+pIndexStart); i++)
                    	{
                    		Vector polyPos = matr * poly_center(pcnt, pnts, &pPoly[i]);
                    	}
                    

                    Something like that, with some other stuff thrown in.  Where pIndexStart and pLength feed it with which share of polygon indexs the thread gets..

                    In what you are saying, would it be more efficient for example, to maybe assign each share of polygons to a separate CPolygon for example, so that each thread is accessing a separate chunk of memory ie not all accessing the same pointers to op->GetPolygonR()..  Or am i looking at this the wrong way..

                    PS - It's fun throwing a dense mesh at this, and watching CPU usage on all cores hit 100%, not something you often see in the viewport 🙂

                    1 Reply Last reply Reply Quote 0
                    • H
                      Helper
                      last edited by

                      On 01/05/2015 at 14:18, xxxxxxxx wrote:

                      as this function is called a lot, in your case I think you can optimize more poly_center() , with these 2 things:

                      1- avoid division, so instead of sum / 3, make it sum * 0.333333
                      2- you may try to use some SSE intrinsics, it "may" improve performance, depends on how bad is the compiler, with a good compiler it won't add much.

                      1 Reply Last reply Reply Quote 0
                      • H
                        Helper
                        last edited by

                        On 18/05/2015 at 05:38, xxxxxxxx wrote:

                        Small addition to 1:
                        It's probably better to write * (1.0 / 3.0). In this way the compiler can do the division at compile time and you end up with the same multiplication as Mohamed suggested, while preserving as much precision as possible.

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post