What is Vector.GetLengthSquared() really meant for?

Cairyn

It's probably irrelevant, but I just spent half an hour writing an explanation why you may want to calculate the squared length of a vector, which goes roughly like this:

lengths are calculated by the Theorem of Pythagoras, which involves three squares and one root operation
for length comparisons, you do not need the exact length but can work with the sum of the squares, because a²<b² --> a<b for positive a,b.
so you may want to go without the expensive root operation in the length calculation and compare the squares instead.

That sounded all nice and logical, but when I did a test with actual timestamps, GetLengthSquared did not save me any time in comparison with GetLength. On the average, the function was actually slower! That makes very little sense, as it should do less.

Can someone tell me what GetLengthSquared actually does internally that needs more time than GetLength? And if it does, why does it exist in the API? What issue is it meant to solve?

Or is this just a Python thing? Interpreter needs more time to find the function... whatever...

ferdinand

Hi,

I cannot confirm this. At least not as an absolute. As [1] shows, it basically depends on the mood of Python's virtual machine which one is faster.

Some attempt at an explanation: While the square root was an actual hinderance for CG in the early days and led to things like the infamous fast inverse square root hack, modern chipsets have specialised instructions for the square root which more or less completely nullify the problem (it's still slower than doing nothing of course). So the computer science aversion against calculating the square root is actually more of a cultivated habit than an actual problem these days.

So the actual difference between taking the square root and not taking it, is quite small these days, because of modern chipset instructions and modern compilers making use of them. And that difference can not only be drowned but also inverted by the relative randomness of timings in the tenth or hundreds of nanoseconds range by code that runs on a virtual machine.

Cheers,
zipit

[1]

import c4d
import random
import time


def timeit(data, function):
    """A poors mans function timing on a collection of objects.
    """
    timing = 0
    for item in data:
        t0 = time.time()
        getattr(item, function)()
        timing += time.time() - t0
    return timing * (1./len(data))


def main():
    """Entry point.
    """
    # 1 million psuedo random vectors.
    count = int(1E6)
    vectors = tuple(c4d.Vector(random.uniform(-1, 1),
                               random.uniform(-1, 1),
                               random.uniform(-1, 1)) for _ in range(count))

    # Ten rounds of calculating the length and the squared length for them.
    for i in range(10):
        msg = "Round: {}, Function: {}, Time (sec): {}"
        for f in ("GetLength", "GetLengthSquared"):
            print msg.format(i, f, timeit(vectors, f))

if __name__ == "__main__":
    main()

Round: 0, Function: GetLength, Time (sec): 2.47002124786e-07
Round: 0, Function: GetLengthSquared, Time (sec): 2.60000228882e-07
Round: 1, Function: GetLength, Time (sec): 2.35999584198e-07
Round: 1, Function: GetLengthSquared, Time (sec): 2.6100063324e-07
Round: 2, Function: GetLength, Time (sec): 3.4800028801e-07
Round: 2, Function: GetLengthSquared, Time (sec): 2.62000322342e-07
Round: 3, Function: GetLength, Time (sec): 2.61000871658e-07
Round: 3, Function: GetLengthSquared, Time (sec): 2.63000011444e-07
Round: 4, Function: GetLength, Time (sec): 2.63998985291e-07
Round: 4, Function: GetLengthSquared, Time (sec): 2.51001596451e-07
Round: 5, Function: GetLength, Time (sec): 2.68000841141e-07
Round: 5, Function: GetLengthSquared, Time (sec): 2.65999794006e-07
Round: 6, Function: GetLength, Time (sec): 2.46999025345e-07
Round: 6, Function: GetLengthSquared, Time (sec): 2.59999036789e-07
Round: 7, Function: GetLength, Time (sec): 2.48000144958e-07
Round: 7, Function: GetLengthSquared, Time (sec): 2.37002134323e-07
Round: 8, Function: GetLength, Time (sec): 2.5899887085e-07
Round: 8, Function: GetLengthSquared, Time (sec): 2.67003059387e-07
Round: 9, Function: GetLength, Time (sec): 2.48999834061e-07
Round: 9, Function: GetLengthSquared, Time (sec): 2.57003307343e-07
>>>

Cairyn

@zipit Thanks for answering! Yes, that's about what I thought... I said "average", not absolute, and your numbers actually support mine, with 6 out of 10 cases taking GetLengthSquared longer. The one notable exception seems to be round 2, but here the measurement for GetLength is so far off the other rounds that I suspect the machine just had to process another heavy load at that moment.

It's still irritating that GetLengthSquared should take longer at all, because it's just one root operation less (that would not even benefit from internal parallelization of operations), but with the numbers so close together, I may assume a statistical noise (from the Python interpreter?) that drowns out any significant timesave.

It might be interesting to check whether the same thing happens in C++, with the same methods behind the API but no Python interpreter to muck things up, or whether a more direct call would reveal actually shorter runtimes. I may do that when I'm at C++ stuff again...

ferdinand

Hi,

I admittedly did miss your "on average" in your first posting, but my code also takes the average (arithmetic mean) over 1E6 computations each round. It is probably also noteworthy that it is somewhat futile to try to measure execution timings in the nanosecond range in Python in the first place.

And I also would have said without hesitation that taking the squared length is much better, i.e. the fact that getting the squared length can be slower is very surprising.

My major point was that there are certain programming assumptions (multiplication is better than division, never take the square root if avoidable, cubic complexity is uncomputable, etc.) that should be taken with a grain of salt due to the fact that they rely on a certain "state" of hardware. I.e. all these three do not really hold true anymore to the extent they once did.

One argument that one could make for still calculating the squared length is that it forces you to think precisely about the (geometric) problem and makes your code more to the point by that. Which is certainly more important than shaving off a few milliseconds over the course of a few million computations.

Cheers,
zipit

Cairyn

@zipit said in What is Vector.GetLengthSquared() really meant for?:

My major point was that there are certain programming assumptions (multiplication is better than division, never take the square root if avoidable, cubic complexity is uncomputable, etc.) that should be taken with a grain of salt due to the fact that they rely on a certain "state" of hardware. I.e. all these three do not really hold true anymore to the extent they once did. (...)

That goes without saying... ultimately, any effort towards optimization needs to be checked for effectivity.

Nevertheless, I was curious and abused a plugin of mine to execute some timer checks in C++, just for comparison (code excerpt only):

			using namespace std::chrono;
			Vector v(100.0, 100.0, 100.0);
			float f;
			milliseconds ms1, ms2, diff;
				
			ms1 = duration_cast<milliseconds>(
				system_clock::now().time_since_epoch()
				);
			for (int i = 0; i < 100000000; i++)
			{
				f = 0; // v.GetLength();
			}
			ms2 = duration_cast<milliseconds>(
				system_clock::now().time_since_epoch()
				);
			diff = ms2 - ms1;
			GePrint(maxon::String("Time counter Empty:") + maxon::String::IntToString(diff.count()));

			ms1 = duration_cast<milliseconds>(
				system_clock::now().time_since_epoch()
				);
			for (int i = 0; i < 100000000; i++)
			{
				f = v.GetLength();
			}
			ms2 = duration_cast<milliseconds>(
				system_clock::now().time_since_epoch()
				);
			diff = ms2 - ms1;
			GePrint(maxon::String("Time counter GetLength:") + maxon::String::IntToString(diff.count()));

			ms1 = duration_cast<milliseconds>(
				system_clock::now().time_since_epoch()
				);
			for (int i = 0; i < 100000000; i++)
			{
				f = v.GetSquaredLength();
			}
			ms2 = duration_cast<milliseconds>(
				system_clock::now().time_since_epoch()
				);
			diff = ms2 - ms1;
			GePrint(maxon::String("Time counter GetSquaredLength:") + maxon::String::IntToString(diff.count()));

Switching off all optimizations, I get (for multiple button presses):

Time counter Empty:185
Time counter GetLength:921
Time counter GetSquaredLength:228
Time counter Empty:184
Time counter GetLength:922
Time counter GetSquaredLength:228
Time counter Empty:183
Time counter GetLength:921
Time counter GetSquaredLength:228
Time counter Empty:183
Time counter GetLength:922
Time counter GetSquaredLength:228
Time counter Empty:185
Time counter GetLength:921
Time counter GetSquaredLength:228
Time counter Empty:183
Time counter GetLength:921
Time counter GetSquaredLength:227

That is far more like what I expected (double so if you consider that the loop with a constant assignment already takes 185ms).

Considering that I had to up the loop count to a hundred million to get measurable results, it is practically guaranteed that any difference between GetLength and GetLengthSquared in the Python sample is drowned in the Python interpreter's overhead, and any result from my initial tests must be attributed to sheer randomness.