developing a notational comma popularity metric

cmloegcmluin
Posts: 993
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer
Contact:

Re: developing a notational comma popularity metric

Dave Keenan wrote: Sun Aug 09, 2020 10:10 am You don't need a 20% speed up, you need a 100-times speedup. Can't you rewrite it in something that runs on the bare metal, like C++, instead of running on a virtual machine.
Might be a bit late for that, unfortunately. I lamented over a month ago that I should have used something like Python. I don't know C++ yet.
But while you're in virtual-machine-land, do you have any code profiling tools so you can find out where it's spending most of its time. And just speed that up.
Hadn't considered that, but that's a cool idea -- I'm looking into it now.
I'm guessing a lot of time is spent computing the base-2 logs of prime numbers. How about just computing them once, and sticking them in a sparse array, indexed by the prime number.
That certainly couldn't hurt. I'll let you know how much that helps.

------

Okay, I've been spending my free time today trying to get the recursive metric perfecting layer to work. I've been able to remove the timing-out mechanisms entirely, which is like a 100% decrease in complexity of the code, with a comparable runtime, using a millionth of a parameter value difference as the threshold for continuing to search further local minima instead. Just solving problems like e.g. making sure that if a 1-chunk parameter spread across all submetrics on the condition that its value is the same across them all, that in the perfecting stage it remains locked down. It's one of those never-ending pits of "todo" statements where I can't seem to resolve one of them without coming across 2 other things I need to mark to address next.

Dave Keenan
Posts: 1300
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

cmloegcmluin wrote: Sun Aug 09, 2020 11:31 am Might be a bit late for that,
Fair enough.
unfortunately. I lamented over a month ago that I should have used something like Python. I don't know C++ yet.
No point going to Python, as far as I can see. Like JavaScript, Python compiles to virtual-machine code. Sure there are just-in-time native-code-compilers for JavaScript and Python, but because these languages allow dynamic typing, they still can't compete for speed with native-code compilers for statically-typed languages like C++, even if you use static-typing in JavaScript or Python.

But yes, it would take way too much work to translate your JavaScript to C++ if you're not already familiar with C++. So better to pursue profiling and ways of speeding up the critical code.
I'm guessing a lot of time is spent computing the base-2 logs of prime numbers. How about just computing them once, and sticking them in a sparse array, indexed by the prime number.
That certainly couldn't hurt. I'll let you know how much that helps.
I assume you have a similar array where you cache the ap part of any soapfr or soapfar outside the loop that runs through the 80 ratios. And just load it with the primes themselves for any sopfr or sopfar. And only recalculate it when a parameter changes that affects the ap.

The fourth slide here lists some standard techniques for speeding up critical code.
http://media.tojicode.com/sfjs-vectors/#1

cmloegcmluin
Posts: 993
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer
Contact:

Re: developing a notational comma popularity metric

Dave Keenan wrote: Sun Aug 09, 2020 3:58 pm Sure there are just-in-time native-code-compilers for JavaScript and Python, but because these languages allow dynamic typing, they still can't compete for speed with native-code compilers for statically-typed languages like C++, even if you use static-typing.
It would appear you know a lot more about this domain than I do. I'll look into these concepts to further my development as a software engineer. Doesn't come up much on the job as a web developer!
I assume you have a similar array where you cache the ap part of any soapfr or soapfar outside the loop that runs through the 80 ratios. And just load it with the primes themselves for any sopfr or sopfar. And only recalculate it when a parameter changes that affects the ap.
Well, I didn't even have the cached (I prefer to call it "memoized" in this context) log operations, so I certainly don't have anything else memoized.

You may be surprised to learn that while we've been talking about speed a lot, I have not put hardly any energy into optimizations. The timeout I had in the code for a while was less about improving performance (it hurt performance!) but more about getting the darned thing to be able to finish *at all* without getting stuck in blue threads of death (and that problem has now been solved in an alternative way (the threshold on improvement to SoS at each recursive step). So my point is, there may be a ton of low-hanging fruit!
The fourth slide here lists some standard techniques for speeding up critical code.
http://media.tojicode.com/sfjs-vectors/#1
Cool beans, thanks!
• Functional API, not OO: So I've written it functionally.
• Cache anything we lookup more than once: I can cache per above.
• Inline code rather than reuse functions: Interesting! Per this link I may be able to improve performance by in two different places that get executed very frequently. I'll let you know how that goes.
• Unroll all the loops!: I understand what loop unrolling is, but I doubt I will be able to accomplish this for this code.
------

Well, so I woke up this morning to results for the chunk count 4 run. I won't keep you suspense much longer. I spent most of the day working out some complications which aren't worth going into. I was optimizing for potentially kicking off a chunk count 5 run ASAP, but it turned out I won't be doing that until doing some more performance improvements:

Based on the results from chunk count 4, I figured I knew enough to eliminate some parameters and submetrics from consideration. I also knew enough to tighten the scopes on several of the parameters that we're keeping around (e.g. why search -3 to 3 for y when actually we've never seen a y less than 0.86 or greater than 0.98 in a metric which was the best metric of its type and beat SoPF>3?). Applying those improvements, I saw an ~7x performance improvement. That'd get 4 chunks from 57 hours down to more like 8, to be an overnight thing. Unfortunately 5 chunks is still not tractable... I still estimate it would take months to finish. So we do need to do something drastic to get it to ever finish.

By the way, I was wrong about not hitting the timeout during the non-recursive run; about 9000 scopes hit it for chunk count 4, or about 4% of them. So the timeout code has been removed, but I think it will still net speed it up, because the code itself slowed stuff down, and also because the changes described in the previous paragraph will almost certainly prevent it from having any scopes with a spike in sample count that will put it above the timeout. Sorry this paragraph is a bit of a mess... don't worry about it. More for posterity, for myself, if need be.

So my next steps will be code profiling to identify the bottlenecks, memoizing repeated calculations, and inlining functions.

------

And I still haven't given you the results for chunk count 3, so I'll describe those here in just a moment!

cmloegcmluin
Posts: 993
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer
Contact:

Re: developing a notational comma popularity metric

The absolute best possible metric for chunk count 3 is one we've already found: wb, with SoS 0.007345361.

Here's the shitty thing though: my code didn't find it. What a frustrating thing this is! The part of the code that, for a given metric, computes SoS definitely still confirms the best text{wb} parameter values you found on 7/28 (as it did before). But here's why: the "maximum unit" of parameter value in the first phase of the search is 0.1, which means for a given parameter's range of possible values, it tests every 0.1 point (e.g. from 1 to 2 it would test 1, 1.1, 1.2, 1.3 ... 1.9, 2.0). This threshold has more effect on how fast the searcher runs than anything else, so I like it where it is. But then it hands over whatever it finds to the perfecter, and the perfecter just assumes that this 0.1 resolution was probably good enough to find the vicinity of the absolute best metric. In the case of text{wb} (and probably many many others, then...) this turns out not to be true. Because the best text{wb} my code found was with w = -1.5633082 and b = -1.8220967, while yours is with w = -1.645808649 and b = -2.043765116. In other words, at the resolution of 0.1, somewhere around 1.8 must have been better for b than 2.0, even though the actual best possible SoS was just a bit deeper down the canyon for 1.8 (it may not have even identified 1.8 as a local minimum at that resolution!). So I wouldn't change my code in response to this. I think we just have to accept that its results may be a bit rough, despite all the work it does...

Notable close contenders:
• A new one, and kind of a weird one. It's text{lb} but then not only is there the log base 2, but also each prime is raised to a power. That power is very close to two, 2.0717828, not that I'm suggesting they nearly cancel out or anything... it's just kinda weird. Also this metric has k = 0.7981481. That gets us SoS 0.007593747. Let's call this text{lb} with the extra a as a power text{alb}. Since we've been not including the "lb" in the names of our metrics, I guess this one would be called "ak" for short.
• A text{kj} where both k and j are powers: 1.4690207 and 1.3673258, respectively. This gives SoS 0.007969499. We can call this one "kj". It's not actually an text{lb}; it's a text{sopfr}.
The absolute best possible metric for chunk count 4 is one we've already found: wyb, with SoS 0.006057649.

Notable close contenders:
• SoS in vicinity of 0.0063 to 0.0066 found by a variety of combinations of text{lb} and text{gpf} with different weighting parameters.
• SoS 006802205 found by text{alb} with a = 2.0791902, w = -0.2209039, b = -1.9497175. I guess this one would be "wab" then.
• Sos 0.006815232 found by text{alb} with a = 1.9713224, x = -0.4582444, u = -1.5128479 (u is the override for x on the denominator). So this one would be "aux" then I guess.
If you want to try to read and interpret the full results yourself, you can find them here: https://github.com/Sagittal/sagittal-ca ... ic/results

For reference, here's (I think) the most recent table:
viewtopic.php?p=2121#p2121

cmloegcmluin
Posts: 993
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer
Contact:

Re: developing a notational comma popularity metric

Whoa, this is so cool. I had no idea I could profile my Node.js code like this.

Using the instructions here, I've learned some interesting stuff:
• About 30% of the time is spent managing the overhead of doing work asynchronously. Now that we're not timing out scope searches, the code being asynchronous is not motivated by interrupt-ability; it is only in place to allow the thing to run at all without crashing due to creating objects that are too large. Currently I solve this by switching back and forth between two tracks of work which push and pop from the same stack of things to do. However, if memory serves, we only hit this upper bound upon trying to run the thing for chunk counts of 8. Maybe 7, or 6. In any case, it's becoming clear that we're never going to be able to use this thing for anything higher than 5 chunks. So perhaps I could go through and rip out all the asynchronous stuff. A bit sad... that stuff was so hard to build.
• Most of the rest of the stuff is hard to interpret, because the functions are all nested inside each other yet are also all listed at the top level, but the work listed for each one seems to include the nested stuff, so it seems like it must be double-triple-hugetuply counting stuff for the lower ones in the final percentage or something... but it seems otherwise like a mostly unsurprising collection of the functions which compute antivotes, sums of squares, and ranks.
• text{monzoFromInteger} is taking up 5% of time, text{combineMonzos} 4%, and text{checkParameterValues} 2%. I could take a closer look at these and see what I could shave off.
• There's also a bunch of text{index.ts} files in there which is odd... it suggests that a technique I use to help enforce good module architecture by maintaining limited interfaces is actually causing a performance cost, despite my assumption that should make no difference at runtime.
The truth is, though, that even if we completely eliminated all of this cost, let's even say we eliminated half the runtime, we'd still have a 5-chunk-count run that would take weeks or months.

(btw, the handful of spots where I thought it was possible to replace anonymous functions with inline functions were false positives. They remain as they are)

cmloegcmluin
Posts: 993
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer
Contact:

Re: developing a notational comma popularity metric

I'm honestly not sure what other calculations I could reliably memoize besides that log base 2 of each prime.

I experimented with it and it somehow didn't improve performance at all.

I think the only real choice is to drastically reduce that maximum unit from 0.1 to something like 0.5, or 0.333. So we'll only learn in the extreme rough ballpark which metrics might be worth looking into.

Dave Keenan
Posts: 1300
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

Have you memoized your ap calcs, or are you repeating them for each of the 80 ratios?

cmloegcmluin
Posts: 993
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer
Contact:

Re: developing a notational comma popularity metric

I know you suggested that before, but there are so many different ap possibilities that I'm sure not how I'd go about caching them, let alone in a way where I'd be convinced the work of looking up the cached value was less expensive than the work of just doing the arithmetic...

cmloegcmluin
Posts: 993
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer
Contact:

Re: developing a notational comma popularity metric

I cut the maximum unit of parameter value down to 1/3 instead of 1/10th. It looks like a chunk count 5 run might actually finish overnight at that resolution.

cmloegcmluin