developing a notational comma popularity metric

Post by **cmloegcmluin** » Wed Aug 05, 2020 5:59 am

Dave Keenan wrote: ↑Tue Aug 04, 2020 1:25 pm I don't count the true constants because, if they have any real existence at all, they are simply hidden inside the box. Whether they are actually there inside the box, is an implementation detail that shouldn't matter.

For example, if we look inside an x² box, we may find there is no number 2 in there at all. It may simply be doing x*x. My resolution of this is to never count the constant, so it doesn't matter if it's really there or not. But your resolution seems to be to claim that it is always possible to view a unary (one-argument) function as a binary (two-argument) function with a constant for one argument. And then I am unclear whether you are counting the function or the constant, since you seem to think they must always go together. But arctan has no constant.

Interesting. I'll need to reconcile this bit with the later stuff about there not being a natural exponent.

So you count arctan as 1 chunk. How is arctan not a function standing alone (in your terms)? There is no parameter or constant associated with it. It just takes its true-variable input and gives its output. e.g. as part of a soapfar where ap = acrtan(p) or ar = arctan(r). I can't see how you could be counting anything but the function in this case.

I completely sympathize with your frustration over this particular matter. I've done a bad job with this one, so I apologize. I'm trying to figure out the least confusing way to dig myself out of it and achieve shared clarity with you here...

Let's start with this: there was an important but unspoken condition on my claim that functions shouldn't stand alone. When I've been tossing around the word "function" the last couple of days, I haven't meant function in the general textbook sense, which would include things like arctan. I've implicitly been using that word to refer to "within our current domain, the things which are helpful to be referred to as functions". That is, I needed a handy way to refer to logarithms, exponents (and roots), and + - × ÷ that I thought would help us speak about the problem space. By my incorrect understanding at that time (read: before I arrived at a deeper level of insight about the nature of the logarithm function) all of those functions had 1 argument which was not optional. And that's all I meant by that they couldn't stand alone.

This suggests that when you count log₂ as 1 chunk you're counting the function, not the constant, as I do. So it would then appear that we agree on counting higher functions and not counting constant inputs to them. But you tell me otherwise below, so I can find no solution in terms of chunks per function, chunks per true constant and chunks per parameter, that satisfies the "simultaneous equations" of your various statements about your chunk counting.

No, as I've tried to say many times now, I did not think of things in terms of counting the function and number separately. I did not think they could stand alone. I did not break log₂ down. I considered function and constant/argument/parameter/base/number/2/whatever-you-want-to-call-that-other-thing to be two subatomic particles of an atomic unit which was worth 1 chunk, because the argument was, by my incorrect understanding then, required. And I kept emphasizing this atomic structure with 2 subatomic parts view because as of yet could not believe that any function we cared about in the domain was anything other than a function with one required argument.

I now see that logarithm is special in the sense of "being a primitive" like arctan. If we were to for some reason throw arctan into the mix here, I would count it as 1 chunk. So, faced with the primitiveness of logarithm, I ultimately I will revise my perspective on atoms and subatomic particles to this: it is the function which I count as a chunk, and for those functions with required arguments, I could be said either to give them the arguments for free, or even to consider them part of the function. That I've been emphasizing the numbers first and then their applications via some function has just been an unfortunate result of my certainty that we were only dealing with functions of one type: those with one required numeric argument.

I thought you had functions 0, parameters 1, constants 1, where I have higher-functions 1, parameters 1, constants 0

But then you count arctan as 1 chunk. I'm mystified.

Per the above, I have functions 1, optional parameters 1, constants 1.

Is this correct, where 2 is a true constant while `a` and `k` are model parameters?

Function of p Chunks (DB) Harlows (DK)

arctan(p) 1 1

ln(p) 1 1

log₂(p) 1 1

log_a(p) 1 2

k×ln(p) 2 2

It would have been, before I attained the insight that the argument to a logarithm is optional on some crazy deep mathematical level. Now that I understand that, my chunks column completely agrees with yours for this particular set of entries.

My chunk count will count parameters as 1 chunk always, where a parameter chunk is comprised of some value and some function application of that value, leading to log₂(x) as 1 chunk and log_a(x), a = 2.017 as 1 chunk.
It seems to me, you would have to count arctan(x) as 0 chunks to be consistent with that.

I agree. But I think it should be clear from above that had I known we were working with some functions that were not types which took a single numeric argument, I would have written: "My chunk count will count parameters as 1 chunk always, where a parameter chunk is comprised of some function and any of its required arguments, leading to log₂(x) as 1 chunk and log_a(x), a = 2.017 as 1 chunk." Although logarithm is a bad example now, because it was the function which I came to see broke from that function type. So in the end, my bullet will look almost exactly like yours: "My chunk count will count functions in-and-of-themselves, and then count their arguments as a chunk when they are parameters but only when they are optional, leading to log₂(x) as 1 chunk and log_a(x), a = 2.017 as 2 chunks." Literally the only difference is that you say "not when they are true constants" and I say "only when they are optional".

This will be our second of two points of disagreement over the implementation of chunks: you will give + - × ÷ for free, while I still count them.
I haven't noticed you doing that so far. Is this a new thing?

No, it's not new. Consider `k` applied as a coefficient. This is a classic one we've been using since nearly the beginning. It's the function × which has one required argument: what to multiply by. So the function × counts as one chunk. The value multiplied by is either considered to be part of the function (in that sense ×3 is a different function from ×4) or it's considered to be given for free.

Or to be more specific (and preventing confusion with respect to the differences in how we see things via the other point of disagreement addressed above), you would count +1 as 0 chunks, because + is free, and 1 is a constant. While I would count +1 as 1 chunk, because it is an application of a value as a function where that value is 1 and the function is addition. And you would count +w, w=-1.88 as 1 chunk, because + is free, and w is a parameter. While I would +w as 1 chunk, because it is an application of a value as a function where that value is -1.88 and the function is addition.
Although I think I know what you mean, I think "application of a value as a function" is an abuse of terminology. But how is arctan the application of a value as a function?

I expect I've gone over this enough in this post already, but it may be helpful to respond in detail anyway.

I agree that "application of a value as a function" is an abuse of terminology, especially as soon as we expand the scope of our discussion beyond the domain wherein we're only dealing with functions with one required numeric argument (which I hadn't realized we already had, by including logarithm).

Arctan is definitely not the application of a value as a function.

I would rewrite my above paragraph this way now: "you would count +1 as 0 chunks, because + is free, and 1 is a constant. While I would count +1 as 1 chunk, because it is a function with one required argument which is therefore in some sense part of the function, and that function is addition and the value of the required argument is 1. And you would count +w, w=-1.88 as 1 chunk, because + is free, and w is a parameter. While I would +w as 1 chunk, because it is a function with one required argument and the function is addition and the required argument in this case is -1.88."

To me, the distinction that matters for fitability is "the application of a function" (e.g. arctan(x) or log₂(x)) versus "the application of a parameterised function" (e.g. log_a(x) or k×log₂(x)). The last two examples are exactly equally fitable, as they will give exactly the same rankings for all ratios, when trained on the same data. i.e. they will give the same metric.

Awesome. I recognize that you've tried to say this thought at least a couple times, but this is the first time it clicked for me. Perhaps I could speak it back to you to confirm my understanding. You attempt to measure in harlows the fitability of a metric in the sense of their ability to change the SoS. I think that's a really cool idea. In fact, I think it's better than attempting to measure the complexity of explaining the metric, which is how I've been approaching it. This is the "model validity" vs "model simplicity" dichotomy again, but now you're getting even more precise about what you mean by validity, upgrading it to the word "fitability".

Dave Keenan wrote: ↑Tue Aug 04, 2020 9:05 pm Thanks for finding that. It looks pretty damning. What was I thinking!?

I tried to disclaim that whether or not we ever claimed a parameter as a constant was moot in my previous post, but I might have undermined that by adding this evidence on top of the disclaimer. Sorry!

It seems to me — now that you've confirmed my understanding of the logarithm function — that 2 in some sense is not even a constant. It's more like: if you want to use logarithms in some project, you should pick one base and stick with it, and it doesn't matter what that base is (other than that you should pick something conventional, like 2, e, or 10). Whatever your base is more like which logarithmic world you want to live in for now.

So I don't think it matters how we arrived at 2, whether it occurred to us or the code helped us. No need to weasel or anything. Because we could literally pick anything. Maybe that's what you're trying to say about the parameters being "not independent of each other" (I still don't exactly get that point).

The boundary notation makes that point even more strongly.

I should definitely have immersed myself in it before continuing on this project. Sorry.

In either case, I'm sorry for leading you to believe that I grokked logarithms on as deep a level as you do. I guess I had always thought of them as basically the opposite in some essential way as exponentiations. But I cannot find any evidence of a "natural exponent" anywhere. I mean of course you could raise something to the e'th power, but a cursory examination doesn't turn that up as being a common practice or being of much use.
Yes. Good observation. There is of course the natural-exponential function, the inverse of the natural-log, sometimes written exp(x) instead of e^x. But I understand you are saying, correctly, that there is no such thing as a "natural-power" function, or a "natural-root" function, from which any other root (or power) could be obtained by simply multiplying the result (or the input) by a constant.

Okay, great, I'm glad I got that part right.

So then how could there be a constant when used in an exponent or root?

I might agree that in the sum-of-squares formula, the exponent 2 is a constant, because it's part of Pythagoras's theorem about hypoteneuses and distances and such. That makes sense to me.

So if we had some math/phil/psych/musical justification for picking a particular exponent in a particular context, then perhaps it could count as a constant?

As far I'm aware, we don't have any such constants going on for us right now.

You say we would need to have put them in there "from the start". I think what bothers me about this idea may be a bit superficial. I would excuse us if we had simply failed to recall some important math/phil/psych/musical concept involving a constant until the code reminded us of it, and then once it had, starting using that constant and not counting against the fitability of the metric. How do you feel about that (it may not even come up, so maybe you don't need to answer).

Therefore by your "harlows" a logarithm could be one harlow, but if you ever needed to use an exponent or raise something to a power, it'd have to be two harlows, because there is no such thing as a constant exponent (unless there is a difference between a constant and a primitive to you, which I don't understand yet).
I have no idea why you say there is no such thing as a constant exponent. Don't x² and x³ and x^1/2 have constant exponents? I would only count them as 1 harlow (assuming these constants were put in from the start, not obtained as optimised parameters).

Right, so what I was flailing about in trying to get across there was the difference between the e in log_e and the e in x^e. I was trying to capture the same exact idea that you just confirmed for me above: that there is no equivalent of a natural exponent or natural root in the way that there is a natural logarithm. I suppose the distinction is not super important to you: if it's a constant as it is in x^e, it counts for 0 harlows. If it's not even there in some sense since `text{ln}` is a "primitive" or whatever you want to call it and the `e` doesn't even exist, then it still counts for 0 harlows.

-------

Alright, so I think that while we disagree conceptually a bit still, that our chunks and harlows almost completely agree now with respect to the searching my code is doing. The fact is that the only true constant we've really looked at is this base to the logarithm, and as I've mentioned, my code treats that situation how you'd want it to. We don't have any true constants that are being applied with + - × ÷, which would be 0 harlows but 1 chunk. The one place I can think of where we'd disagree is when raising thing to parameters which are exponents, since you'd count that as 2 harlows where I'd count it as 1 chunk. But we can deal with those metrics when they come up.

Post by **cmloegcmluin** » Wed Aug 05, 2020 9:55 am

Alright, we're still on track. It's been running for 18 hours now and is right at 25%.

Post by **Dave Keenan** » Wed Aug 05, 2020 1:02 pm

Great news. Thanks.

Post by **cmloegcmluin** » Wed Aug 05, 2020 4:06 pm

Don't get toooo excited. It's just the run for 4 chunks. It probably won't tell us too much we don't already know. And we'll still have a lot of thinking to do before kicking off a run for 5 chunks if we ever want to get helpful results.

What I'm most excited to learn, I think, is the results of which parameters make themselves useful at the 4-chunk count, so we can make sufficiently informed decisions about which ones to continue including at all.

For now I'll take a moment to point out that of the 18 possible parameters we've proposed thus far which could be used in a 2-chunk metric*, only 8 of them can improve SoS at all when paired with any of the 6 submetrics we've proposed thus far. If they can't improve the fitability of the metric on their own, it makes me skeptical that they have a rightful place in the final metric at all. And none of these parameters which don't help beat SoPF>3's SoS at 2 chunks find themselves as parts of metrics which are among the best metrics for 3 chunks, which makes me suspect that for the most part these metrics they're part of are just worse versions of better 2-chunk metrics: 2-chunk metrics that were good enough to survive a bit of a ding from the addition of a 3rd chunk which actually didn't help by adding it but rather hurt it.

I had not considered until now the idea of building another layer on the solver where it automatically finds these situations and removes them from consideration. But perhaps the problem will not be complex enough that we can't filter these out manually.

I recognize the possibility that some parameters will seem useless or counterproductive alone but when combined together work wonders. Perhaps we'll start to see those effects more strongly with 4-chunk metrics. Perhaps we've already found some such combos in the 6- 7- and 8- chunks we've found manually.

At 3 chunks, the parameters which have shown themselves to be the most promising are: k as a coefficient, k as an exponent, j as an exponent, a as a base (=2), and a as an exponent.

*Of the 25 total possible parameters, 7 of them have no existence at 2 chunks. 4 of those are the 4 different ways of weighting a submetric, since with 1 chunk for a submetric, any weight on it would be a monotonic change and thus unhelpful. The other three are the denominator-only alternative values for parameters (v for y, b for w, and u for x) since as I described earlier I decided they shouldn't exist if their non-denominator-only counterpart doesn't already exist, so with 1 chunk for a submetric and 1 for a parameter there are none left for one of these types.

Post by **cmloegcmluin** » Thu Aug 06, 2020 12:46 am

A couple things occurred to me overnight:

1. If for chunk count n, the best SoS with a given parameter in it cannot beat the best SoS period for chunk count n-1, then that parameter is probably less than useless at that chunk count n. In other words, because among the metrics tried out at chunk count n one should find the best metric for chunk count n-1 with the single difference of it including this parameter, then apparently it could only hurt things around that vicinity. Now to be clear, it’s still possible that this parameter is capable of improving SoS against other chunk count n-1 metrics, but if those metrics aren’t the ones that were already on track to be among the best ones, then we don’t really care about those cases. (by the way, eliminating parameters which aren't helping could potentially lead to ballparks of 20x speed... which still might not be enough to make the 5-chunk run tractable on my personal computer, because 1/20th of a year is still a pretty long time...)

2. Perhaps we should into possibilities for renting space on a supercomputer in the cloud, so we don’t have to do more work and make sacrifices and leave rocks unturned. It might be pretty cheap these days after all. I’m not sure but I’ll look into it!

Post by **Dave Keenan** » Thu Aug 06, 2020 8:50 pm

I like reading your realisations of how you can prune the search. But I'm afraid paying for supercomputer time to make decisions of so little import (the ratio definitions of tina diacritics) seems ludicrous to me. Particularly when Excel can optimise a metric with 20 parameters in 5 minutes.

You should tell your code to go onto the next metric as soon as it finds a low enough SoS to mark a metric as "worthy of further examination in Excel". Then post the metric here.

-- Dave

Post by **cmloegcmluin** » Fri Aug 07, 2020 2:05 am

Dave Keenan wrote: ↑Thu Aug 06, 2020 8:50 pm I like reading your realisations of how you can prune the search. But I'm afraid paying for supercomputer time to make decisions of so little import (the ratio definitions of tina diacritics) seems ludicrous to me.

Fair enough. It might be cheaper than you think. And it might just be fun! But I hear you, just on principle.

Particularly when Excel can optimise a metric with 20 parameters in 5 minutes.

I have no doubt that the mathematicians and computer scientists working on Excel's solver have produced something much better at optimizing than what I've built with web technologies unsuited for this type of computational intensity.

That said, my code does a lot more than just optimize metrics. Otherwise you would have already found our best metric by now yourself!

You should tell your code to go onto the next metric as soon as it finds a low enough SoS to mark a metric as "worthy of further examination in Excel". Then post the metric here.

-- Dave

It... does. Well I mean, it can tell within a few milliseconds whether the metric is worthy. The problem is that it's checking millions of combinations of them, and for each combination, sometimes millions of sample points within the reasonable ranges for each parameter.

------

Well it has been an eventful morning! I woke up and flipped my monitor on to see that the run for chunk count 4 was not at about 80% like I'd expect but actually at 100%! Amazingly, my timing was such that I got to witness the final second and a half of its run and its printing to the screen of "FINAL STATUS..."

When the good metrics came out, I was shocked to see that the top 100 or so were all better than anything we'd seen thus far. By a lot. In the 0.0002 ballpark. I grabbed the very best one it found and plugged it into my script for checking sums-of-squares on a one-off basis, which also prints out the comparison for each ratio, the ranking and the antivotes and our approximate ranking. My heart sank when I saw that most of the antivotes values had come out null. I am so frustrated because it's not like that hadn't happened before, very early on in the development on this thing, and I friggin' fixed that problem! Apparently I didn't fix it hard enough, or rather have a comprehensive enough test in place to prevent it from arising again.

So, that metric was crap. Oh well. I went to go check the next one, not with very high hopes, but you never know. But I was shocked and terrified to see that the results were wiped out! I had failed to remember that all of these scripts output to the same array of files. And my IDE's local history feature didn't save me either because the file had been too large or because it hadn't existed in that state long enough for it to get snapshotted. And I hadn't checked it into version control. And I hadn't copied it to my paste buffer. I had quite comprehensively fucked up and lost all the work.

...Fortunately, the terminal I had run the command in still had some results. Not all of them (99% of them had been scrolled off the top of its history) but fortunately since I had programmed the thing to sort results so that the best ones came last, I still had the most important bits. Phew.

I manually checked a few others and there are indeed a ton of nonsense results mixed in there. So this is just a manual, eye-balling check among them for now, so might be a better one in there, but the best one I found that looked reasonable was `text{wyb}`. Yes, wyb, but as a 4-chunk metric. It has made me realize that wyb is actually 4 chunks, not 5! As I demonstrated earlier, I can barely read big sigma notation without a ton of concentration, so I counted your wyb metric so that soapfar counted for a chunk and the lb counted for a chunk, but actually it's just an lb, because you proved "lb(n) = soapfr(n), where ap = p → lb(p)" (here).

So my next steps will be fixing this null antivotes making past the point they shouldn't make it bug. Probably there were some of these such bad apples in the lesser chunk count runs that I didn't notice, too. My point is that I will definitely be able to convince myself this won't happen again before sending it off again on another 57 hour 10 minute journey.

And to be clear, I would have had to re-run it again whether or not I obliterated the run's output... it was just rotten output. (though I should also fix the problem of overlapping logging targets which caused the output obliteration...)

I really was hoping to get some more feedback about which parameters are actually helping and which are probably only hurting us so I can remove those from the search space. It had also occurred to me that I look over the results and see for each parameter what the max and min values that resulted in an SoS better than SoPF>3 were so that I could tighten up the search space in that regard too. So maybe the next run could be a bit faster if I at least apply those learnings from chunk counts 1, 2, and 3.

But now I've got to head off to work for the day.

Post by **cmloegcmluin** » Fri Aug 07, 2020 6:02 am

Had a few minutes to look at things during my lunch break.

JavaScript has a primitive value called NaN, which stands for "Not-a-Number". I was using the type check "typeof candidate === 'number'" to confirm that results of the antivotes calculation were all numbers, as opposed to nulls, undefineds, or NaNs. Hilariously, in JavaScript, "typeof NaN" returns "number". So that was a problem. Thanks, JavaScript.

volleo6144 · Post by **volleo6144** » Fri Aug 07, 2020 6:30 am

cmloegcmluin wrote: ↑Fri Aug 07, 2020 6:02 am JavaScript has a primitive value called NaN, which stands for "Not-a-Number". I was using the type check "typeof candidate === 'number'" to confirm that results of the antivotes calculation were all numbers, as opposed to nulls, undefineds, or NaNs. Hilariously, in JavaScript, "typeof NaN" returns "number". So that was a problem. Thanks, JavaScript.

Wait, why would you be getting NaNs from antivotes calculations anyway?

...Anyway, NaN is just another number of the type JS uses, but it isn't equal to itself and mostly propagates through any mathematical operations you do on it (the one exception I know of is NaN⁰ = 1).

Post by **Dave Keenan** » Fri Aug 07, 2020 7:50 am

cmloegcmluin wrote: ↑Fri Aug 07, 2020 2:05 am Particularly when Excel can optimise a metric with 20 parameters in 5 minutes.

... my code does a lot more than just optimize metrics. Otherwise you would have already found our best metric by now yourself!

I'm well aware that it does a lot more than optimise metrics. That's why I was making the point that it would run a lot faster if you can make it concentrate on those things, and not do so much of what Excel can do quickly. This might mean that your test is not exhaustive, but only probabilistic.

But as Tom West says, in Soul of a New Machine. "Not everything worth doing is worth doing well". Quick'n'dirty is good if it works, and gets you to market before the competition.

And BTW. I may have already found the best metric, in wb or wyb.

But that's beside the point.

You should tell your code to go onto the next metric as soon as it finds a low enough SoS to mark a metric as "worthy of further examination in Excel". Then post the metric here.
It... does. Well I mean, it can tell within a few milliseconds whether the metric is worthy. The problem is that it's checking millions of combinations of them, and for each combination, sometimes millions of sample points within the reasonable ranges for each parameter.

I didn't realise how many 4-chunk metrics you were examining. I figure it can't be more than 25x24x23x22 = 303600. And 57 h 10 min is 205900 s. So that's an average of 678 ms per metric, which is faster than Excel. Well done.

But the vast majority of those metrics must be complete rubbish. There ought to be some way to reject them very quickly. Is that what you're saying you do, "in a few milliseconds"? Do the others then go on to take many seconds?

Sorry to hear of the bug. Happy to hear of your save. Also happy to hear that you agree re the simplicity of wyb.

Does that mean that wb should have come out of your 3-chunk search, along with laj (which I call kl)?

The Sagittal forum

developing a notational comma popularity metric

Re: developing a notational comma popularity metric

Re: developing a notational comma popularity metric

Re: developing a notational comma popularity metric

Re: developing a notational comma popularity metric

Re: developing a notational comma popularity metric

Re: developing a notational comma popularity metric

Re: developing a notational comma popularity metric

Re: developing a notational comma popularity metric

Re: developing a notational comma popularity metric

Re: developing a notational comma popularity metric