developing a notational comma popularity metric

cmloegcmluin
Posts: 721
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer
Contact:

Re: developing a notational comma popularity metric

The simple copfr experiment was not fruitful. I couldn't find any example of a weight > 0 on it helping. Cool idea, but it doesn't look like its our winner.

cmloegcmluin
Posts: 721
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer
Contact:

Re: developing a notational comma popularity metric

Dave Keenan wrote:
Tue Jun 30, 2020 10:14 am
...I'm hot on the trail of a prime weighting function that may eliminate the need to include prime-limit/gpf in the mix. It looks a lot like log3(p)-1.
Oh, interesting! So you adjust each prime by a constant as well (in this case -1)? I'll play around with that too.

I don't think it makes as much sense to try that on the term.

Dave Keenan
Posts: 1024
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

I did a thing where I left prime-limit out of the mix (forced s=0), but I kept y, where the number repetitions r of a given prime is replaced with ry where y is slightly less than one. And when summing the ry's for each prime p, instead of weighting each according to pa or logα(p) I let the solver adjust the weights of all the primes separately, along with k and y, to minimise the sum of squared errors in the reciprocals of the ranks (with my usual fudge to avoid sorting).

Then I plotted these prime weights against the primes and saw that it did indeed look roughly logarithmic. It turned out it was an even better fit to a function of the form logα(p)+w.

BTW, changing that variable name from α to e is a really bad idea, Mexican food notwithstanding, because using e as a variable log base is exactly the place where it would be confused with the constant log base e ≈ 2.718.

[Edit: Oops! You suggested epsilon ε, not e. That wouldn't be so bad, except it is traditionally used to represent an infinitesimal.]

So I then weighted each ry as (logα(p)+w) × ry before summing them. This was done separately for the numerator and denominator as usual, which were then summed as usual, with the smaller term first being multiplied by k. And I added the prime-limit back in (multiplied by s). Here are some approximate optima I found. The underlined numbers were held constant.

α		w (your d)	k		y		s		SoS
3.956349187	-0.619217685	0.638243216	0.883788532	0.020609268	0.006160415
3		-0.774993871	0.638278131	0.883803886	0.025836729	0.006160415
2.718281828	-0.851411926	0.638277637	0.883804124	0.028385603	0.006160415
3.018652175	-0.904768274	0.618447635	0.874496057	0		0.007488211
3		-0.909855998	0.618460475	0.874485023	0		0.007488211
3		-1		0.67017005	0.955080391	0		0.008473958

I'm keen to know how low you can get the SoS in the vicinity of these results.

Dave Keenan
Posts: 1024
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

cmloegcmluin wrote:
Tue Jun 30, 2020 11:15 am
The simple copfr experiment was not fruitful. I couldn't find any example of a weight > 0 on it helping. Cool idea, but it doesn't look like its our winner.
No, but the fact that a negative weight on it helped, was interesting. I realised that if instead of the simple c × copfr(n/d) you split it into
w×copfry(n) + k×w×copfry(d), then when you add it to the existing
sopαfry(n) + k×    sopαfry(d), where pα = logα(p), you get

sopαwfry(n) + k×sopαwfry(d), where pαw = logα(p)+w.

[Edit: I changed the first occurrence of "w × copfr(n/d)" above to "c × copfr(n/d)", because the first is a coefficient of copfr while those that follow are coefficients of copfry.]
cmloegcmluin wrote:
Tue Jun 30, 2020 11:18 am
Oh, interesting! So you adjust each prime by a constant as well (in this case -1)? I'll play around with that too.
Yes. Although I found the optimum to be around -0.8 to -0.9.
I don't think it makes as much sense to try that on the term.
Not sure what you mean by that. But I should mention that I also tried weighting the prime exponent before raising it to the power y, i.e. ((logα(p)+w)×r)y, but I couldn't get the SoS quite as low with that as with (logα(p)+w)×ry.

cmloegcmluin
Posts: 721
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer
Contact:

Re: developing a notational comma popularity metric

I have to go make dinner now but I will respond to your post soon. I just wanted to share that I got SoS of 0.004250806!

This is using sopafry, with logarithmic a and exponential y, and this new constant we're adjusting the prime by (which I call w... really running out of single-letter variables here!)

k = 0.038 (I know... extremely low... which is weird, but I'm going to treat it as 0)
s = 0 (no prime limit)
a = 1.994 (so basically log2, which is certainly psychologically motivated!!)
y = 0.455 (so basically square root)
c = 0.577 (that's copfr... no a or y in it. so it did, after all, play a part in this, apparently doing better where k made sense too)
w = -2.08 (so basically subtract 2 from each prime)

max(so{log2p-2}f√r(num), so{log2p-2}f√r(den)) + ⅗copfr(num/den)

Dave Keenan
Posts: 1024
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

The name "d" is problematic, as I've used it for denominator many times above. Can we call it "w"? My earlier use of "w" for what you're now calling "c", was short lived, as I decided it was simpler to roll copfr into so{..p..}fr, whereupon your "c" was shown to be almost equivalent to your "d" (my "w"). I have gone back and edited my earlier copfr coefficients from "w" to "c".

The almostness is due to the fact that copfr was not previously split into separate calculations for numerator and denominator, and was not previously applied to prime exponents raised to the power y.

We're definitely in trunk-waggling territory now, with 5 or 6 parameters. We should try to cut that down if possible without going over SoS = .0055.

It seems fairly insensitive to the value of alpha (the log base), so we might claim that as a constant rather than a parameter. But then I thought it would be 3 where you are finding it better as 2.

I was hoping you could set c=0 without too much damage because your d (my w) should pick up the slack. But then I realised that k=0 means the denominator is irrelevant, except in the count of its primes, including repetitions. So if you set c=0 then you will need k≠0.

I will investigate this k=0, c≠0 regime.

Dave Keenan
Posts: 1024
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

I'm afraid this:

so{log2p-2}f√r(num)

is really pushing our idiosyncratic function-naming scheme beyond the breaking point.

No one coming to this for the first time would ever suspect that "so{log2p-2}f√r" was the name of a single function.

It would be read as so × {log2p-2} × f × sqrt(r(num)).

As long as we were only using subscripts and superscripts, we could always fall back to the real name by unsubsupering them (and ungreeking them). Hence my use of sopαwfry → sopawfry for this above. The sub and superscripting works as a mnemonic for us, but to new-comers, even that will be read as so × pαw × f × ry.

Dave Keenan
Posts: 1024
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

cmloegcmluin wrote:
Tue Jun 30, 2020 12:36 pm
d = -2.08 (so basically subtract 2 from each prime)
I assume you mean "so basically subtract 2 from the log of each prime".

Dave Keenan
Posts: 1024
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

α		w (your d)	k		y		s		c		SoS
1.994		-2.08		0.038		0.455		0		0.577		0.004250806

What do you get when you set k=0, c≠0, and when you set c=0, k≠0?

cmloegcmluin
Posts: 721
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer
Contact:

Re: developing a notational comma popularity metric

Dave Keenan wrote:
Tue Jun 30, 2020 1:07 pm
"d" is problematic, as I've used it for denominator many times above. Can we call it "w"? My earlier use of "w" for what you're now calling "c", was short lived ... I'm happy to go back and edit my earlier copfr coefficients from "w" to "c".
You're right, and I should have noticed that. Yes, let's make it w. I can go back and edit my previous post from "d" to "w" too.
Dave Keenan wrote:
Tue Jun 30, 2020 1:32 pm
I'm afraid this ... is really pushing our idiosyncratic function-naming scheme beyond the breaking point.

...we could always fall back to the real name
Dave Keenan wrote:
Tue Jun 30, 2020 12:07 pm
BTW, changing that variable name from α to e is a really bad idea, Mexican food notwithstanding, because using e as a variable log base is exactly the place where it would be confused with the constant log base e ≈ 2.718.

[Edit: Oops! You suggested epsilon ε, not e. That wouldn't be so bad, except it is traditionally used to represent an infinitesimal.]
I'm not sure what "real" name you're referring to. As soon as we went beyond sopfr and sopf, aren't these all new things without established/"real" names? Or do you mean something else by that?

These unwieldy names are helpful during the development process. Once we reach the final step of naming these functions nicely for the outside world we can rely a lot more on the descriptions of what exactly the functions do, and reduce the name to something optimized for pronounceability, e.g. "soapfar" for "sum of adjusted prime factors (with) adjusted repetition", or something like that.

It might be cool if we added a plugin to the forum for LaTeX or MathJax. That might help get these formulas across in a less disgusting and/or intimidating way
Dave Keenan wrote:
Tue Jun 30, 2020 12:07 pm
I'm keen to know how low you can get the SoS in the vicinity of these results.
Do you still want me to check those, even though I've since found one with 0.004250806?

Actually, scratch that. I think your method will need to fulfill the role of the "home stretch": finding the exact values down to the millionths place or whatnot. The way I'm doing things, it's not really tractable to look deeper than the thousandths place. So if you're already working with SoS-billionths, I'm not going be able to help you get any more precise.

I could at least double-check them, though, if you want.
Dave Keenan wrote:
Tue Jun 30, 2020 12:31 pm
cmloegcmluin wrote:
Tue Jun 30, 2020 11:15 am
The simple copfr experiment was not fruitful. I couldn't find any example of a weight > 0 on it helping. Cool idea, but it doesn't look like its our winner.
No, but the fact that a negative weight on it helped, was interesting.
I actually didn't check it with negative values, FYI. I should have tried that, though, and I should have been more clear.
Dave Keenan wrote:
Tue Jun 30, 2020 12:31 pm
I realised that if instead of the simple c × copfr(n/d) you split it into
w×copfry(n) + k×w×copfry(d), then when you add it to the existing
sopαfry(n) + k×    sopαfry(d), where pα = logα(p), you get

sopαwfry(n) + k×sopαwfry(d), where pαw = logα(p)+w.
it was simpler to roll copfr into so{..p..}fr, whereupon your "c" was shown to be almost equivalent to your "d". I'm happy to go back and edit my earlier copfr coefficients from "w" to "c".

The almostness is due to the fact that copfr was not previously split into separate calculations for numerator and denominator, and was not previously applied to prime exponents raised to the power y.
I can't figure out how that works. Is it using a logarithmic identity I'm not familiar with? I'm interested, certainly, since it seems like you found a way to consolidate the count of primes into their sum.
Dave Keenan wrote:
Tue Jun 30, 2020 12:31 pm
I don't think it makes as much sense to try that on the term.
Not sure what you mean by that. But I should mention that I also tried weighting the prime exponent before raising it to the power y, i.e. ((logα(p)+w)×r)y, but I couldn't get the SoS quite as low with that as with (logα(p)+w)×ry.
Dang, I was afraid that might not be clear enough, but I was in a rush. What I meant was that while it feels right to adjust the primes by some constant – if only because we're starting with 5, just kind of floating out there in Fiveland – adjusting the terms of the monzos (AKA the repetitions, or "r") by a constant doesn't make as much sense (whereas I do think adjusting them by a power or log does make enough sense).
Dave Keenan wrote:
Tue Jun 30, 2020 1:44 pm
cmloegcmluin wrote:
Tue Jun 30, 2020 12:36 pm
d = -2.08 (so basically subtract 2 from each prime)
Do you mean "so basically subtract 2 from the log of each prime"?
Yes, what I meant was subtracting 2 outside the quotes, like logα(p) + w. I did not mean logα(p + w). In my defense, I'm pretty sure logarithms come before subtraction in order of operations, so my "log2-p" was accurate (and I think the only reason I left off the parens was because I was trying to have it double as part of something like a name... the problem which we've already covered above)
Dave Keenan wrote:
Tue Jun 30, 2020 12:31 pm
I should mention that I also tried weighting the prime exponent before raising it to the power y, i.e. ((logα(p)+w)×r)y, but I couldn't get the SoS quite as low with that as with (logα(p)+w)×ry.
Ah. I'm glad for your "i.e." clause because I would have interpreted what precedes it differently otherwise. I thought by "weighting the prime exponent before raising it to the power" you meant logα(p + w). I thought that because we're still entertaining both sublinear exponents and logarithms (though logarithms seem to be winning now) and so when you said "before raising it to the power" I thought that was standing in for either raising to a power or putting to a logarithmic base. I had not even yet considered the prospect of raising the whole thing to some power (or to some base), even after the repetition count had been raised to some power (or put to some base). All of these things are real and distinct possibilities though! We could have:

((logα(p + v)+w)×rx)y or logy(((p + v)α+w)×logxr)

or any other combination of logs and exps, where v is just some other constant and x is just some other exponent...

Yeah... let's not go down that path...
Dave Keenan wrote: We're definitely in trunk-waggling territory now, with 5 or 6 parameters. We should try to cut that down if possible without going over SoS = .0055.
I agree with the sentiment here.

Honestly, I think that while "max(so{log2p-2}f√r(num), so{log2p-2}f√r(den)) + ⅗copfr(num/den)" looks kind of gross, it wouldn't actually be that bad once you got in LaTeX. Especially if you have devised a way to fold in copfr. And I think the log base 2 on the prime is kind of pretty, even, in its recognition of human base-2 pitch perception. The square root bit of r makes less immediate musical sense, but it offends me less than the ⅗ on the copfr, insofar as I sense there might be an underlying truth to something like a square root...
Dave Keenan wrote:I was hoping you could set c=0 without too much damage because d (or w?) should pick up the slack. But then I realised that k=0 means the denominator is irrelevant, except in the count of its primes, including repetitions. So if you set c=0 then you will need k≠0.

I will investigate this k=0, c≠0 regime.
I look forward to your results. I'm not really sure what more I can produce tonight until I hear back from you in more detail about your consolidation of copafry into sopafry, so that I can understand it well enough to implement it myself.
Dave Keenan wrote:
Tue Jun 30, 2020 2:59 pm
What do you get when you set k=0, c≠0, and what when you set c=0, k≠0?
Welp, you snuck one more in before I could get my post out.

with k = 0, I can get 0.004609100.
that's with c = 0.723, a = 1.753, y = 0.473, w = -2.620
those parameters are disconcertingly different from the c = 0.577, a = 1.994, y = 0.455, w = -2.08 which got us 0.004250806 with k = 0.038. If you just take those parameters and set k = 0 then you get 0.004749566 which is also quite close.

with c = 0, I can only get 0.006251296.
that's with k = 0.635, a = 1.430, y = 0.850, w = -2.770

So it would seem that the right path forward is to obliterate the smaller of the num and den, and use the count of primes to account for the presence of harmonic information on the other side.