Dave Keenan wrote: ↑Sun Aug 09, 2020 3:58 pm
Sure there are just-in-time native-code-compilers for JavaScript and Python, but because these languages
allow dynamic typing, they still can't compete for speed with native-code compilers for statically-typed languages like C++, even if you use static-typing.
It would appear you know a lot more about this domain than I do. I'll look into these concepts to further my development as a software engineer. Doesn't come up much on the job as a web developer!
I assume you have a similar array where you cache the ap part of any soapfr or soapfar outside the loop that runs through the 80 ratios. And just load it with the primes themselves for any sopfr or sopfar. And only recalculate it when a parameter changes that affects the ap.
Well, I didn't even have the cached (I prefer to call it "memoized" in this context) log operations, so I certainly don't have anything else memoized.
You may be surprised to learn that while we've been talking about speed a lot, I have not put hardly any energy into optimizations. The timeout I had in the code for a while was less about improving performance (it hurt performance!) but more about getting the darned thing to be able to finish *at all* without getting stuck in blue threads of death (and that problem has now been solved in an alternative way (the threshold on improvement to SoS at each recursive step). So my point is, there may be a ton of low-hanging fruit!
Cool beans, thanks!
- Functional API, not OO: So I've written it functionally.
- Cache anything we lookup more than once: I can cache per above.
- Inline code rather than reuse functions: Interesting! Per this link I may be able to improve performance by in two different places that get executed very frequently. I'll let you know how that goes.
- Unroll all the loops!: I understand what loop unrolling is, but I doubt I will be able to accomplish this for this code.
------
Well, so I woke up this morning to results for the chunk count 4 run. I won't keep you suspense much longer. I spent most of the day working out some complications which aren't worth going into. I was optimizing for potentially kicking off a chunk count 5 run ASAP, but it turned out I won't be doing that until doing some more performance improvements:
Based on the results from chunk count 4, I figured I knew enough to eliminate some parameters and submetrics from consideration. I also knew enough to tighten the scopes on several of the parameters that we're keeping around (e.g. why search -3 to 3 for y when actually we've never seen a y less than 0.86 or greater than 0.98 in a metric which was the best metric of its type and beat SoPF>3?). Applying those improvements, I saw an ~7x performance improvement. That'd get 4 chunks from 57 hours down to more like 8, to be an overnight thing. Unfortunately 5 chunks is still not tractable... I still estimate it would take months to finish. So we do need to do something drastic to get it to ever finish.
By the way, I was wrong about not hitting the timeout during the non-recursive run; about 9000 scopes hit it for chunk count 4, or about 4% of them. So the timeout code has been removed, but I think it will still net speed it up, because the code itself slowed stuff down, and also because the changes described in the previous paragraph will almost certainly prevent it from having any scopes with a spike in sample count that will put it above the timeout. Sorry this paragraph is a bit of a mess... don't worry about it. More for posterity, for myself, if need be.
So my next steps will be code profiling to identify the bottlenecks, memoizing repeated calculations, and inlining functions.
------
And I still haven't given you the results for chunk count 3, so I'll describe those here in just a moment!