developing a notational comma popularity metric

User avatar
cmloegcmluin
Site Admin
Posts: 768
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer
Contact:

Re: developing a notational comma popularity metric

Post by cmloegcmluin »

Dave Keenan wrote:
Thu Jul 02, 2020 3:53 pm
Good idea. Feel free to ignore my ramblings. :)
Hardly ramblings! But sometimes it does feel like we're making up a language to talk about a language we're making up to talk about problems I only barely understand in the first place :)
Dave Keenan wrote:
Thu Jul 02, 2020 8:54 pm
A random observation: The ratio 11/7, by any reasonable metric, is way more popular than it has any right to be. It has rank 12, between 11/5 and 17/1. It ought to be up near 13/5, with rank 15 to 17.
Perhaps this another of those examples @volleo6144 has put out where popularity may be influenced not merely by numbers but by harmonic function? I'm not sure if an undecimal subminor sixth is disproportionately more useful than an undecimal neutral second or a septendecimal semitone, though...

------

Only update on my front is that my attempts to brute force some insight have been butting up against my JS engine's maximum heap size. I experimented a bit with transpiling it all to Python (which perhaps I should have written this whole module in from the start... ) and I got 90% of the way there but it's not working out.

Oh, and what I said earlier about adding constants to the repeat count: I think it only makes sense to do when the monzo term is nonzero. Otherwise you get all sorts of nonsense.

User avatar
cmloegcmluin
Site Admin
Posts: 768
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer
Contact:

Re: developing a notational comma popularity metric

Post by cmloegcmluin »

A stray observation of mine:

I'm finding w is often coming out a little bit below -2, which is suspiciously close to -log2(5) ≈ -2.322.

A logarithmic identity is that logb(x) - logb(y) = logb(x/y).

So in other words, log2(p) + w when w = -log2(5) is equivalent to log2(p/5), which feels right. Like what we were saying earlier about five being the "first" prime, and with w what we were trying to achieve is getting the intercept correct. I'm not sure if this undermines the "right" feeling I get from the base 2 logarithmic at all.

Your w's are maybe close to -log2(3) ≈ -1.585, which for similar reasons might also make sense.

User avatar
Dave Keenan
Site Admin
Posts: 1068
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

Post by Dave Keenan »

Let me know if you think this would be a feasible and worthwhile experiment for you to conduct with your javascript or python code?

I suggest that, for a while, you should entertain von Neumann's pachyderm, and make him not only wiggle his trunk but dance, turn every colour of the rainbow, and grow stripes. Thanks for that image. I had to go find the video. https://www.youtube.com/watch?v=RoysQe-2HS4

You would do this, not by having 5 or 6 parameters, but 23 parameters. 2 of them would be the repeat-count-compression exponents, y for the numerator (big side) and v for the denominator (small side). 15 of them would be separate weighting (multiplying) factors for each (compressed) term in the monzo for the numerator, for the primes from 5 to 59. Call them µ5 thru µ59. The remaining 6 parameters would be separate weighting factors for each (compressed) term in the monzo for the denominator, for the primes from 7 to 23. Call them δ7 thru δ23. I note that δ5 would not be a parameter, but would be fixed at 0.5.

So the metric, whose sum-of-squared errors in reciprocal-ranks is to be minimised, would be
metric = µ5 ... µ59][n5y ... n59y + δ5 ... δ23][d5v ... d23v

To have enough data to fit such high primes you would use the first 106 2,3-reduced ratios, excluding those with ranks 91 and 101 (which have primes 67 and 211).

I suggest initial values y = 0.87, v = 0.74, µp = log2(p/3), δp = (p-2)/6.

Then when you've found values for those 23 parameters, that minimise the SoS, you should plot µp and δp against p, and post the graph(s) here.

User avatar
cmloegcmluin
Site Admin
Posts: 768
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer
Contact:

Re: developing a notational comma popularity metric

Post by cmloegcmluin »

How's about this. I found one that gets really close to the smallest SoS we've seen, but using one submetric only (soapfar). I think by your definition this uses 6 parameters though.

sum-of-squares: 0.004643131 by a soapfar where:
ap = p → logα(p + x) + w
ar = r → ry + t
k = 0.5920238095238095
α = 2.0107142857142857
w = -2.341928094887362
x = 3.069642857142857
y = 1.6476190476190475
t = 1.658452380952381
and the numinosity is determined post-soapfar.

What's disturbing, however, is how sensitive this situation is. You might think you could get away with substituting all these simple numbers:
k = 3/5
a = 2
w = log25 ≈ 2.32192809489
x = 3
y = 5/3
t = 5/3
but that knocks it to 0.0080 something.

It's interesting, I think, that y is > 1. When I was using the copfr function along with sopafar, y was < 1. I can see it going either way.

Interestingly, we can get really close to that, a SoS of 0.004715651, with a soapfar where
k = 0.5970238095238095
a = 2.0125
w = -2.334428094887362
x = 3.069642857142857
y = 1.6226190476190474
t = 1.618452380952381
In which y and t are unmistakably close to the golden ratio, φ. Which doesn't feel psychologically motivated. But interesting perhaps, especially when you realize that 2φ = 3.06956450765 which is just absurdly close to that x. Also k is not far off from 1/φ. But again, you budge any of these just a wee bit and the SoS changes dramatically.

The sensitivity of these numbers makes me think that the formula is overspecified in a way that makes it really brittle. In other words, even though its summed over 80 data points, there might be a bit of a "luck" element in there.

I am able to get a 0.005680348 SoS with only 5 parameters, dropping the t:
k=0.6328571428571429
a=1.5728571428571425
w=-3.0571428571428574
x=1.607142857142857
y=0.8571428571428571
but that doesn't quite make your 0.0055 cutoff. Man, it's distressing what a wide variety of parameters will get you something so close to the same level of ranking fit.

Again and again I find that prime limit does not help enough.
And I cannot find a situation where any other function besides soapifar (where the prime counting function pi is applied to the prime too) which ever moves the needle.
And I haven't found a situation where using the original numerator and denominator as the numinosity function (identity, as you called it; pre-soapfar, I guess, would be another way to call it) improved things for me.

------

I almost turned in for the day when I stumbled across this, almost by accident:

I can get 0.006689884017318771 SoS with a metric so simple it's almost unbelievable: w = -1. That's right: no k, no a, no y. Just subtract one from each prime as you sum them up. It's clear how this would help rate 25/7 better than 17/1. Of course it ignores the difference between 35/1 and 7/5 though... but apparently that doesn't matter? That's a reduction in SoS of about 40% from good ol' SoPF>3, which gave 0.011375524.

Awwww... well, I got super excited about that until I looked closer and realized the problem: lots of the values tie for the same rank, and the way my code works is that it then essentially gives the benefit of the doubt (i.e. if something goes wrong and our metric ends up assigning the exact same number of "antivotes" to each ratio, then my code gives that run a perfect score of 0 SoS, since the rank comes out 1,2,3,4,5... perfectly matching the real ranks which are of course already sorted). So I need to confront this issue of "fractional ranks" which I had noticed earlier but figured with all the complexity to our metric I wouldn't really have to deal with.

Okay, I've fixed that problem. I'm not super nervous that other results of mine would have suffered from this bug. The actual SoS for w=-1 is 0.012420586... slightly worse than SoPF>3.

------

I think I'm about ready to turn in and suggest that the one I found earlier, with only 4 parameters, is the way to go:

SoS 0.004250806
k = 0
a = 1.994
y = 0.455
w = -2.08
c = 0.577 (the weight on copfr)

It's "4" parameters, but involves 2 top-level submetrics: soapfar and copfr (and you might count k=0 as a parameter...) I wonder if you might prefer this 4-parameter soapfar-only one I found:

SoS 0.006519249
k =0.6
a = 3
w = -1
y = 0.8766666666666667
(all of my a's are now bases, by the way, not powers)

It's not terribly dissimilar from some of the others you've thrown out in the past couple pages ~0.006 but with the special behavior for 5 in the mcopfr. It may not be quite as good SoS but it's simpler I think.
Dave Keenan wrote:
Fri Jul 03, 2020 2:02 pm
Let me know if you think this would be a feasible and worthwhile experiment for you to conduct with your javascript or python code?
I may be about spent on this front. Sorry.
Last edited by Dave Keenan on Tue Jul 14, 2020 9:12 pm, edited 2 times in total.
Reason: Inserted "SoS 0.004250806" for 4 parameter metric found earlier. Added anchor "#kwxy".

User avatar
Dave Keenan
Site Admin
Posts: 1068
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

Post by Dave Keenan »

Thanks so much for all that. I will attempt to confirm these results and understand them.

User avatar
Dave Keenan
Site Admin
Posts: 1068
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

Post by Dave Keenan »

ar = r → ry + t seems completely bizarre to me. You're giving repeat-counts to primes that don't exist. The output of the metric depends on how wide your monzos are.

User avatar
cmloegcmluin
Site Admin
Posts: 768
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer
Contact:

Re: developing a notational comma popularity metric

Post by cmloegcmluin »

I’m pretty sure I said this at some point, buried up in all this, but the trick with that t is that it only applies to non-zero terms of the monzo. So it doesn’t depend on the width of the monzo. But it also might hide some of the complexity that copfr more straightforwardly realizes.

User avatar
Dave Keenan
Site Admin
Posts: 1068
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

Post by Dave Keenan »

So it's really r → ry + r?t:0

Some good news. I believe that when you use soapfar only, with the same soapfar for num and den, then you can always make a = 2 by dividing w by log2(a). That eliminates one parameter. You might confirm that this gives the same SoS (maybe tomorrow). :)

User avatar
Dave Keenan
Site Admin
Posts: 1068
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

Post by Dave Keenan »

cmloegcmluin wrote:
Fri Jul 03, 2020 2:21 pm
Awwww... well, I got super excited about that until I looked closer and realized the problem: lots of the values tie for the same rank, and the way my code works is that it then essentially gives the benefit of the doubt (i.e. if something goes wrong and our metric ends up assigning the exact same number of "antivotes" to each ratio, then my code gives that run a perfect score of 0 SoS, since the rank comes out 1,2,3,4,5... perfectly matching the real ranks which are of course already sorted). So I need to confront this issue of "fractional ranks" which I had noticed earlier but figured with all the complexity to our metric I wouldn't really have to deal with.

Okay, I've fixed that problem. I'm not super nervous that other results of mine would have suffered from this bug.
Good work, finding that bug. But you should indeed be afraid that other results suffered from this bug.
I think I'm about ready to turn in and suggest that the one I found earlier, with only 4 parameters, is the way to go:

SoS 0.004250806
k = 0
a = 1.994
y = 0.455
w = -2.08
c = 0.577 (the weight on copfr)

It's "4" parameters, but involves 2 top-level submetrics: soapfar and copfr.
No matter how I try, I cannot reproduce anything like this SoS for this metric. At first I thought I was getting close to the same SoS, until I realised there was only one zero after the decimal point. I get a SoS a little more than 10 times greater!

I believe you only got that result because of the aforementioned bug.

I realise now, that this metric can't work, because it gives the same rank to 11/5 and 11/7, and it gives the same rank to 13/5, 13/7, 13/11, and the same to 13/25 and 13/35, and to 17/5, 17/7, 17/11, 17/13, etc.

[Edit: But I confirmed back here: viewtopic.php?p=1950#p1950, and again just now, that by increasing k to 0.038, effectively just a tie-breaker, I can get SoS 0.004250806. But then, that's 5 parameters.]

[Edit2: But we can set the log base, a = 2, so it's no longer a parameter. Whereupon I obtain:

SoS 0.004447424
k = 0.038
y = 0.455
w = -2.09
c = 0.579 ]

User avatar
Dave Keenan
Site Admin
Posts: 1068
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

Post by Dave Keenan »

I also found this less-fragile minimum with the same 4 parameter metric and a=2.

SoS 0.005589449
k = 0.213895488
y = 0.642099097
w = -2.048657352
c = 0.551650547

I also want to investigate the following, and try to get the same SoS with 4 parameters, a=2.

All your base are belong to dos.

But I ran out of time tonight.
cmloegcmluin wrote:
Fri Jul 03, 2020 2:21 pm
I am able to get a 0.005680348 SoS with only 5 parameters, dropping the t:
k = 0.6328571428571429
a = 1.5728571428571425
w = -3.0571428571428574
x = 1.607142857142857
y = 0.8571428571428571
I'm thinking SoS 0.0055 may have been too low a threshold. Perhaps influenced by that bug with tied ranks?

Post Reply