Page 20 of 47

### Re: developing a notational comma popularity metric

Posted: Thu Jul 30, 2020 1:29 am
^^ This is awesome. This is essentially exactly what I was hoping I could find somewhere on the internet but briefly tried and failed. And it's pretty too! I think I already see now why only xc and logcx are necessary, but I'll let you finish your thoughts before I say anything more.

I'm still struggling to achieve in code the "pull the plug after a few seconds" effect. It has nothing to do with any of the ideas we're discussing here. Strictly the limitations of my own capabilities as a software engineer. Getting really frustrated and down on myself about it, but I'll be good to overcome it soon.

Update: I finally got it to work! However, despite pulling the plug on individual scopes which are taking too long to search, it overall runs slower. Twice as slow, in fact. In other words, the infrastructure required just to enable the interruption of the different threads gums things up so much that it slows down enough to counteract it. And that's just with 2 chunks; I expect the slowdown gets worse per chunk count. So that's kinda depressing. There may well be another way to write it so it doesn't have that problem, but I'm struggling to make it happen. In any case, it should theoretically allow me to actually finish running the thing for 3+ chunks, which I suppose is better than nothing.

### Re: developing a notational comma popularity metric

Posted: Fri Jul 31, 2020 12:32 am
Good to hear you're making progress.

It turns out I can only eliminate 2 of those 6 functions from consideration. Feel free to try the rest.

    cx —— xc
\  /
c1/x ——   —— lb(c)/lb(x)
\  /  \  /
x1/c   lb(x)/lb(c)


x1/c is the same as xc because 1/c is just a different c. So there are really only 5, for our purposes.

lb(x)/lb(c) can be rewritten as c × lb(x) because 1/lb(c) is just a different c.

Similarly lb(c)/lb(x) can be rewritten as c × 1/lb(x).

We know that any metric that matches the scala archive stats must be an increasing function of the primes and their repeat counts. So in the case of c × 1/lb(x), c must be negative. And in the case of c1/x, c must be between zero and one. And in the case of cx, c must be greater than 1.

But we've also seen that it must be a compressive function (have slope that decreases with primes and their repeat counts), so that eliminates cx.

### Re: developing a notational comma popularity metric

Posted: Fri Jul 31, 2020 1:54 am
Again, that's super cool Dave, and I'll reference this when responding to your email about these operations for sure. However, I think I should probably cut myself off from further work on this task. I've done my best to estimate how long it would take for my code to exhaustively search 3 chunks and it's still less than a day, but to then exhaustively search the 4 chunk space could take weeks. And I still just have so many tweaks I need to make to the code w/r/t parameters, submetrics, distributions, etc. that I'm not exactly looking forward to. I think we should instead focus on the non-brute-force techniques we'd been using earlier to guess at some good possibilities, or just select from the ones we've found so far, and just for now go with something that is a pretty good upgrade from SoPF>3 but which is not necessarily the end-all-be-all notational comma popularity metric. Perhaps someday down the line a subject matter expert in mathematical optimization will come along and be able to engineer something where I could not. I'm sorry but my day job is draining enough for me right now that I'm just not able to muster the willpower and brainpower necessary to get this over the line in the free hours I allocate to Sagittal.

### Re: developing a notational comma popularity metric

Posted: Fri Jul 31, 2020 7:15 am
I say all that and yet I'm still attacking the problem. I think if I reduce the resolution of its searching to tenths of parameter values rather than hundredths, we can hope the thing will at least catch a whiff of the best metrics per chunk count, and we can take it from there. And that will make it run many times faster.

Giving up makes me angry!

### Re: developing a notational comma popularity metric

Posted: Fri Jul 31, 2020 10:22 am
Also, I was wrong about it slowing the thing down by 2x. I realized that it wasn't a perfectly controlled experiment. I had forgotten that during the process of adding the ability to pull the plug on long-running searches, I had discovered an oversight I had made when changing the solver to base the resolution of its searches on parameter value rather than count of samples, which was causing it to search way fewer things than at the moment I thought it should. When I went back and made that change in the world before plug-pulling, I expected to see it slowed down tremendously (to where it *should* have been before, that is). However, I then found that it couldn't even run to completion, not even for 2-chunks! My suspicion is that it's because my oversight had been closely akin to disabling blue threads of death, and now they were reactivated. But this is all helping me to sort my intentions out a bit better. For this solver, I don't think I want it to be running in recursive mode. Recursive mode is for when you know basically what you're looking for and want to find the exact ideal values. When brute-forcing-ish as the solver does, it shouldn't recurse at all! Sorry if you're not following. I think I just need to frantically yell what I'm thinking at the forum to keep my motivation up, haha... (I'm not drinking, by the way, unless you count this afternoon cup of coffee)

### Re: developing a notational comma popularity metric

Posted: Fri Jul 31, 2020 10:34 am
We also have the change-of-base identities, which make the logarithm-based ones equivalent to $$log_cx$$ and $$log_xc = \frac1{log_cx}$$, with different c's. And if we're allowing c<1 in the latter, we also ought to allow c<1 in $$c^x$$, which has a similar effect of only allowing the penalty to be so harsh, even for ridiculously high values of x (except, this time, 5-10-15 maps to 20-10-5 [with some scaling] instead of the other way around), which means we still have five possibilities to work with.

### Re: developing a notational comma popularity metric

Posted: Fri Jul 31, 2020 12:09 pm
Okay, I finally got it to give me an answer for 3 chunks! Well, I did get it to give a 3-chunk answer a while back on page 14, but that one didn't make much sense and maybe we later disproved it too? Can't recall.

The SoS is 0.007281501, which is lower than any metric we've found thus far with less than 5 chunks (according to this latest table).

I'm not going to present it all nice and pretty yet since I'm still in the middle of a ton of stuff, and I have not yet fed this scope into my recursive "metric perfecter" command. But basically it's sopf + j * soapfr(n) + k * soapfr(d) where ap = log(4/3)p, j = 24/19 and k = 1. And also I still haven't made some important changes to the code I want to make before giving final answers per chunk count...

What's really exciting here is that it only took my code 52 minutes to find this result. So there's hope yet for finding 4- and 5- chunk metrics with my code.

### Re: developing a notational comma popularity metric

Posted: Fri Jul 31, 2020 4:37 pm
Good to have you back in this thread, @volleo6144.
volleo6144 wrote:
Fri Jul 31, 2020 10:34 am
We also have the change-of-base identities, which make the logarithm-based ones equivalent to $$log_cx$$ and $$log_xc = \frac1{log_cx}$$, with different c's.
You may have missed where I started with the functions in that form, on the previous page.
And if we're allowing c<1 in the latter, we also ought to allow c<1 in $$c^x$$, which has a similar effect of only allowing the penalty to be so harsh, even for ridiculously high values of x (except, this time, 5-10-15 maps to 20-10-5 [with some scaling] instead of the other way around), which means we still have five possibilities to work with.
As your example shows, cx with c<1 decreases with increasing x, so it didn't seem like it would be of any use for estimating ratio unpopularity, with x being either the prime p or the repeat-count r. But I suppose it could be, if some other component offset its decrease, to give a net monotonically-increasing convex-upward function.

### Re: developing a notational comma popularity metric

Posted: Fri Jul 31, 2020 6:45 pm
cmloegcmluin wrote:
Fri Jul 31, 2020 12:09 pm
Okay, I finally got it to give me an answer for 3 chunks! ... The SoS is 0.007281501, which is lower than any metric we've found thus far with less than 5 chunks ... basically it's sopf + j * soapfr(n) + k * soapfr(d) where ap = log(4/3)p, j = 24/19 and k = 1.
That's what I like. A new metric to try out. Thanks.

But I can't reproduce that at all. I get SoS = 0.0257 with your parameter values, and the best Excel can find is SoS = 0.0116 with j = 1.623058836, k = 0.603053515.

You say it's only 3 chunks, so I had to assume it uses n ≥ d rather than soapfr(n) ≥ soapfr(d), although in desperation I tried the latter, but there was no significant difference. In desperation I also tried sopfr(nd) instead of sopf(nd), also to no avail.

Incidentally, I used a trick to calculate sopf in Excel. Since I was already set up to calculate sopfar where ar = ry, I calculated sopf by setting y = 10-16. 00 gives an error, but 010-16 = 0, and for 1 ≤ r ≤ 10, r10-16 = 1, in Excel.

BTW, what is the point of using a log base of 4/3? I thought we agreed to standardise on base 2 logs, in which case multiplying j and k by $$\frac1{\operatorname{lb}(4/3)}$$ ≈ 2.40942084 gives the same result.

### Re: developing a notational comma popularity metric

Posted: Sat Aug 01, 2020 2:01 am
Yes, welcome back @volleo6144 ! Although I feel you've been here the whole time, offering your silent support
Dave Keenan wrote:
Fri Jul 31, 2020 6:45 pm
That's what I like. A new metric to try out. Thanks.
You're welcome! I'll need to add a higher threshold on my despair valve so I expose y'all to less of my self-doubt.
But I can't reproduce that at all. I get SoS = 0.0257 with your parameter values, and the best Excel can find is SoS = 0.0116 with j = 1.623058836, k = 0.603053515.
Well, that's easy to explain! It's because I misread my output and told you the wrong thing! It's me, not you.
You say it's only 3 chunks, so I had to assume it uses n ≥ d rather than soapfr(n) ≥ soapfr(d), although in desperation I tried the latter, but there was no significant difference.
I think we should assume n ≥ d unless stated otherwise, since that's how the Scala stats we're working from are.
In desperation I also tried sopfr(nd) instead of sopf(nd), also to no avail.
Had you tried gpf(nd) you may've availed. Could you try:

$$\text{laj}(n,d) = \operatorname{gpf}(nd) + \sum_{p=5}^{p_{max}} ( j × (log_{a}{p}){n_p} + (log_{a}{p}){d_p}), \text{where }n>=d$$
$$a=4/3, j=13/10 \text{ gives } SoS=0.007099822$$

As you can see, I have now had the time to run it through the recursive perfecter and gotten the SoS a tad lower by tweaking j.

And I chose "l" for "largest" to represent gpf.
BTW, what is the point of using a log base of 4/3? I thought we agreed to standardise on base 2 logs, in which case multiplying j and k by $$\frac1{\operatorname{lb}(4/3)}$$ ≈ 2.40942084 gives the same result.
Interesting. Yes, I recall agreeing that lb was the way to go in the end. But I never revised the code to lock that in. I suppose I thought that whatever weird logarithmic base it ever found, we could clean it up ourselves after the fact to be in lb form. And that I wouldn't want to artificially constrain my code to be capable of only logarithms of base 2. But I see now that it should be pretty logical and straightforward to only lock those in when running my solver (as opposed to the helfpul commands I still have around for calculating the antivotes of a given ratio for a given submetric, or the sum of squares for a given submetric across all (usu. top 80 anyway) ratios. And I think that will eliminate a lot of wasted (redundant) effort by the solver too.

We could convert log(4/3) to lb by multiplying j and k. But then k will be non-1, and thus we'll have an extra chunk. We could adjust j and k, maintaining the proportion, until k is 1 again, but then the total value from the sopfr will be different, and we'd need a weight on it (or the other term, the gpf) and thus we'd still have the extra chunk. So it feels a bit weird, but I suppose we could think of it this way: using a non-2 base allows this metric to take a 3-chunk rather than 4-chunk form. And so if we're set on lb, then we should probably find a different best 3-chunker. What do you think?