Page 4 of 50

Re: developing a notational comma popularity metric

Posted: Fri Jun 26, 2020 11:24 pm
by volleo6144
cmloegcmluin wrote: Fri Jun 26, 2020 4:02 pm I'm not sure I understand exactly what you mean:

1:343 abs(n - d) = 343 - 1 = 342
1:341 abs(n - d) = 341 - 1 = 340

1:343 abs(sopfr(n) - sopfr(d)) = 7 + 7 + 7 = 21
1:341 abs(sopfr(n) - sopfr(d)) = 11 + 31 = 42

It looks like abs(sopfr(n) - sopfr(d)) maybe does a better job, but I don't see a "heavy [penalty]" for abs(n - d).
If you extend this to the extreme, cases like the 49:9765625n (for the half-tina) have ... issues:

abs(9765625 - 49) = 9765476
abs(sopfr(9765625) - sopfr(49)) = abs(50 - 14) = 36, on par with 5:41

Maybe I didn't emphasize the "heavy penalty" I was talking about enough with the examples. (Although the 49:9765625n's high apotome slope and 3-exponent might be a bad thing anyway...)

Re: developing a notational comma popularity metric

Posted: Fri Jun 26, 2020 11:53 pm
by Dave Keenan
Yes, changing a in sopafr() does change the ranking.

I swapped n and d as required to ensure sopfr(n) ≥ sopfr(d), then I plotted sopafr(n) + k*sopafr(d) against scala_rank(n/d) and adjusted spinners for k and a, to maximise the monotonicity of the first 29 ratios by eye. This allowed me to avoid the sorting step. I settled on k = 0.68 and a = 1.13. You can play with the spinners yourself in the attached spreadsheet. One way to improve the monotonicity further would be to adjust the weightings of the primes independently, as both of you guys have suggested.

Re: developing a notational comma popularity metric

Posted: Sat Jun 27, 2020 2:31 am
by cmloegcmluin
volleo6144 wrote: Fri Jun 26, 2020 11:57 am
cmloegcmluin wrote: Sun Apr 23, 2271752 11:33 pm Ah ha! I see now. The usage of . as a multiplication symbol is not intuitive to me.
That thing has an SSL certificate that's expired by about a week...
If you're not comfortable visiting that link, volleo6144, it was an interesting article Dave had shared with me recently that laments a distortion in common usage of the word "intuitive"; it suggests that perhaps most of the time what the person using the word "intuitive" really means is just "familiar". So that's what I was communicating to Dave — there's nothing wrong with him using . for multiplication in general (I'm familiar with that practice from the prime factored forms of comma names in Sagittal such as the 5.13:7.11n, though I thought it was only used in that context because they needed to be clean/compact or something).

------

I have just discovered that phpBB does not seem to let you nest quotes 4-deep. So here's what should be the initial, deepest nested quote:
cmloegcmluin wrote: Fri Jun 26, 2020 6:26 am I do not currently have access to a powerful hunk of math software such as MATLAB or Mathematica. WolframAlpha's online regression analysis tools seem to be somewhat limited; specifically, they only ever work (for me, anyway) with data in three dimensions or fewer. If we could find a way to do a regression analysis on the prime exponent vectors (monzos, or kets) of these notational ratios along with their popularities, we could find a big 'ol polynomial w/ different coefficients for each prime. Dunno if quadratic would work or if we'd have to go cubic or quartic, but that might do the trick.
And then here's the rest of the replies:
Dave Keenan wrote: Fri Jun 26, 2020 5:58 pm
cmloegcmluin wrote: Fri Jun 26, 2020 9:06 am
Dave Keenan wrote: Fri Jun 26, 2020 8:40 am I use Excel's Solver when I want to do that kind of thing. But independent weights for each prime still won't give you different ranks for 7/5 and 35/1.
Wouldn't it? In the case of 7/5, the 5-term of the monzo is negative, while in 35/1 it's positive. Couldn't that affect the outcome?
Wouldn't it be the treating of positive exponents differently from negative exponents that made the difference?
Yes, it would be the treating of positive exponents differently from negative exponents that made the difference. What else could you mean by "independent weights for each prime" in the context of a polynomial regression analysis?

------
Dave Keenan wrote: Fri Jun 26, 2020 5:58 pm The Euclidean distance is where you take all the differences, square them, sum the squares, then take the square root. It's a generalisation to n-dimensions, of Pythagoras' theorem for finding the hypotenuse. Finding the square of the Euclidean distance, simply has the effect of undoing that last step where you took the square root. So I totally agree, it is a confusing term. Better to just not do that step in the first place, and so call it the "sum of squared errors" or "sum of squared differences", often abbreviated to just "sum of squares".
Omigosh. I was actually even more confused than I thought. I would be able to articulate the nature of my confusion, but I don't think it makes a particularly good story, so I'll keep it to myself. Thanks for further explaining.

------
Dave Keenan wrote: Fri Jun 26, 2020 5:58 pm
cmloegcmluin wrote: Fri Jun 26, 2020 9:06 am I can also try abs(n - d) where n/d is the 5-rough ratio, unless you have some reason to henceforth prefer abs(sopfr(n) - sopfr(d)).
It's that thing I mentioned earlier. Sopfr() is a kind of logarithm. It feels wrong to add numbers to their logarithms. They feel like incommensurate things, like adding pascals (sound pressure) to decibels (log of sound pressure).
I get it. Because in the actual ratios, the primes are multiplied, but now we're summing them. So it's gearing down one level of hyperoperations.
volleo6144 wrote: Fri Jun 26, 2020 11:24 pm
cmloegcmluin wrote: Fri Jun 26, 2020 4:02 pm It looks like abs(sopfr(n) - sopfr(d)) maybe does a better job, but I don't see a "heavy [penalty]" for abs(n - d).
If you extend this to the extreme, cases like the 49:9765625n (for the half-tina) have ... issues:

abs(9765625 - 49) = 9765476
abs(sopfr(9765625) - sopfr(49)) = abs(50 - 14) = 36, on par with 1:299

Maybe I didn't emphasize the "heavy penalty" I was talking about enough with the examples.
Alright, let's strike abs(n - d) from the conversation then (and be forgiving of each other if we resurface it... a lot to keep track of here...)

-----
Dave Keenan wrote: Fri Jun 26, 2020 6:16 pm
cmloegcmluin wrote: Fri Jun 26, 2020 4:02 pm But one issue is that it results in ρ coming out extremely close to 1 in every case, so it's hard to tell whether our metric is truly an improvement. SoPF>3 already has ρ = 0.9999999998222343! That said, k = 1.5 maximizes ρ = 0.9999999998823996 (it's some number near 1.5; I don't know the exact range within which ρ = 0.9999999998823996, but for a decent slice of k around 1.5, the ranks aren't sorting any better or worse).
I wouldn't bother calculating ρ. I'd just look at the sum of squared errors. But I'm curious how you're getting from the sum of squared errors in rank-1.37, to ρ. I wouldn't have a clue how to normalise that.
Um... it looks like normalize means to make it so that some important value in a system is equal to 1. Which makes sense because in one of my personal projects that's part of what I ended up naming the type for scalars that ranged from 0 to 1.

Well, does it not seem pretty clear from those numbers all being extremely close to 1 that the simplified Spearman's formula we're using is still normalized? I have no idea where the "6" comes from that's in the numerator of the thing subtracted from 1 ohhhhhhh well of course subtracting from 1 is what keeps us super close to 1. Okay. So you're saying that if we change to using rank-1.37 then we can't use "6" anymore, or something like that?

I guess that could make sense, since after raising the ranks to the -1.37 power they are no longer sequential integers, and it does say in that Wikipedia article that the simplified formula only works when they are sequential integers. So maybe we have to figure out how to use the more complex form of the formula.

------
Dave Keenan wrote: Fri Jun 26, 2020 6:16 pm I hope you swapped numerators and denominators where required to ensure sopfr(n) ≥ sopfr(d). For example, 25/11 would need to become 11/25 because sopfr(11) = 11 and sopfr(25) = 5+5 = 10. That's what lets us avoid taking absolute values, and lets us use the simplification sopfr(n) + k*sopfr(d).
I did not! That's a very good point to call out. I had certainly recognized that all of the ratios from your popularities spreadsheet were oriented such that n ≥ d, but it did not occur to me that I'd need to account for situations where sopfr(n) could turn out < sopfr(d). So I'll need to add that layer to my code.

------
Dave Keenan wrote: Fri Jun 26, 2020 11:53 pm Yes, changing a in sopafr() does change the ranking.

I swapped n and d as required to ensure sopfr(n) ≥ sopfr(d), then I plotted sopafr(n) + k*sopafr(d) against scala_rank(n/d) and adjusted sliders for k and a, to maximise the monotonicity of the first 29 ratios by eye. This allowed me to avoid the sorting step. I settled on k = 0.68 and a = 1.13. You can play with the sliders yourself in the attached spreadsheet.
In your sheet, it looks like you're not raising the ranks to -1.37, unless I'm missing something. But I guess that's fine if you were just eyeballing things for now.

Interesting that your k is really close to 2/3, the inverse of what I found. Maybe once I fix my code I'll also come up with something close to 2/3. I want it to be related to 3/2 because something feels so magically cool about this value relating the two primes which we have otherwise stricken from the ratios :)

In any case I think it's a good idea if you do your spreadsheet thing and I do my code thing and we check each other's findings.
Dave Keenan wrote: Fri Jun 26, 2020 11:53 pm One way to improve the monotonicity further would be to adjust the weightings of the primes independently, as both of you guys have suggested.
If this formula works:

sopafr(n) + k⋅sopafr(d)

I would much prefer that to some unwieldy polynomial. Besides, a polynomial would by nature be limited by how many primes we calculated it up through. Whereas this formula generalizes up to any prime limit.

I've got to get to work now, but I'll find time soon to fix my code. And I should be able to code something up that will zero in on the combination of k and a that minimizes the sum of squares.

Re: developing a notational comma popularity metric

Posted: Sat Jun 27, 2020 4:11 am
by volleo6144
cmloegcmluin wrote: Sun Nov 16, 63793524 4:27 am
volleo6144 wrote: Sun Sep 02, 51280660 2:58 am That thing has an SSL certificate that's expired by about a week...
If you're not comfortable visiting that link, volleo6144...
No, there really wasn't anything I was worried about, really; it's just that ... the site's admins probably haven't cared in a week, or they'd have ... done something.

------
cmloegcmluin wrote: Fri Aug 09, 195273112 8:34 am I have just discovered that phpBB does not seem to let you nest quotes 4-deep. So here's what should be the initial, deepest nested quote:
Really?
1 wrote:
2 wrote:
3 wrote: 3
2
1
Yeah, you're right. Strangely, it also deletes the fourth quote when you click Preview.
cmloegcmluin wrote: Fri Jan 25, 269899461 1:15 pm Yes, it would be the treating of positive exponents differently from negative exponents that made the difference. What else could you mean by "independent weights for each prime" in the context of a polynomial regression analysis?
I was thinking it meant, like, weighting the actual primes as something other than the prime itself, to correct for the unique properties that 41 (as a comma that's ridiculously close to another: 41C = 5C - 205n, and 205n < 5831n), 31, 47, and 97 (these three are right next to important 2,3-numbers: 32, 48, and 96, respectively) have.

------
cmloegcmluin wrote: Tue Jun 03, 130213292 3:10 pm
Dave Keenan wrote: Sun Mar 20, 157487757 12:02 pm It's that thing I mentioned earlier. Sopfr() is a kind of logarithm. It feels wrong to add numbers to their logarithms. They feel like incommensurate things, like adding pascals (sound pressure) to decibels (log of sound pressure).
I get it. Because in the actual ratios, the primes are multiplied, but now we're summing them. So it's gearing down one level of hyperoperations.
Yeah, pretty much.

------
cmloegcmluin wrote: Thu Feb 21, 12221388 4:31 pm
volleo6144 wrote: Fri Nov 30, 82682131 2:20 am
cmloegcmluin wrote: Sat Mar 06, 40695762 10:38 pm It looks like abs(sopfr(n) - sopfr(d)) maybe does a better job, but I don't see a "heavy [penalty]" for abs(n - d).
If you extend this to the extreme, cases like the 49:9765625n (for the half-tina) have ... issues:

abs(9765625 - 49) = 9765476
abs(sopfr(9765625) - sopfr(49)) = abs(50 - 14) = 36, on par with 288:299 or 40:41

Maybe I didn't emphasize the "heavy penalty" I was talking about enough with the examples.
Alright, let's strike abs(n - d) from the conversation then (and be forgiving of each other if we resurface it... a lot to keep track of here...)
Yeah, I understand both of those points.

Re: developing a notational comma popularity metric

Posted: Sat Jun 27, 2020 5:56 am
by cmloegcmluin
volleo6144 wrote: Sat Jun 27, 2020 4:11 am Strangely, it also deletes the fourth quote when you click Preview.
That's what bothered me the most about it: without announcing it at all, it just threw away part of my post. Which was hard to catch in such a big post! So...
I've
just
updated
the
default
setting
to
allow
unlimited
quote
depth.
We can rebuild the forum; we have the technology.

----
cmloegcmluin wrote: Fri Jan 25, 269899461 1:15 pm Yes, it would be the treating of positive exponents differently from negative exponents that made the difference. What else could you mean by "independent weights for each prime" in the context of a polynomial regression analysis?
I was thinking it meant, like, weighting the actual primes as something other than the prime itself, to correct for the unique properties that 41 (as a comma that's ridiculously close to another: 41C = 5C - 205n, and 205n < 5831n), 31, 47, and 97 (these three are right next to important 2,3-numbers: 32, 48, and 96, respectively) have.
I think my imagination is just failing me here. I guess I thought that would all be part of such a polynomial best fit. It seemed like Dave was concerned about one aspect of such an approximating polynomial equation, but it wasn't a big deal because another related component of said equation – the sign of the exponents – was the component that would actually affect this aspect. I realize I'm being a bit hand-wavy here. I should probably just cut myself off on this front since I have no idea what I'm doing, hehe...

Re: developing a notational comma popularity metric

Posted: Sat Jun 27, 2020 8:50 am
by Dave Keenan
You'll actually need to ensure sopafr(n) ≥ sopafr(d), not sopfr(n) ≥ sopfr(d). I failed to do that in my spreadsheet, but it didn't matter for those shown on the chart.

Re: developing a notational comma popularity metric

Posted: Sat Jun 27, 2020 9:38 am
by Dave Keenan
cmloegcmluin wrote: Sat Jun 27, 2020 5:56 am I think my imagination is just failing me here. I guess I thought that would all be part of such a polynomial best fit. It seemed like Dave was concerned about one aspect of such an approximating polynomial equation, but it wasn't a big deal because another related component of said equation – the sign of the exponents – was the component that would actually affect this aspect. I realize I'm being a bit hand-wavy here. I should probably just cut myself off on this front since I have no idea what I'm doing, hehe...
I think I understand now, why we've been talking past each other on this. I failed to imagine that your proposed way of treating positive and negative exponents differently, was simply to not do what we'd always been doing up 'til then, which was to take their absolute value. Not taking their absolute value would certainly give different results for 7/5 versus 35/1. I failed to imagine you were proposing that, because I thought it was obvious that it would be too much differentiation. In terms of the formula currently under consideration, namely

sopafr(n) + k × sopafr(d), where sopafr(n) ≥ sopafr(d),

it would be equivalent to having k = -1.

But different things are obvious to different people, which is why its' so good to have us all working on this.

Re: developing a notational comma popularity metric

Posted: Sat Jun 27, 2020 10:24 am
by Dave Keenan
The fact that 211/11 can garner 13 "votes" (211 is an unremarkable prime) while 275/1 =5×5×7 only gets 7 votes, says to me that our formula shouldn't be influenced by anything with 13 votes or less. And of course 14, 15 or 16 votes isn't much more convincing. There's a sudden drop from 19 votes to 16 votes between rank 80 and rank 81, so that seems like a convenient place to cut off. So I think we should only "train" our formula on at most the 80 most popular 2,3-reduced ratios.

In the first 80, there is no prime above 53 in the numerator and none above 17 in the denominator.

Re: developing a notational comma popularity metric

Posted: Sat Jun 27, 2020 10:59 am
by cmloegcmluin
Dave Keenan wrote: Sat Jun 27, 2020 8:50 am You'll actually need to ensure sopafr(n) ≥ sopafr(d), not sopfr(n) ≥ sopfr(d). I failed to do that in my spreadsheet, but it didn't matter for those shown on the chart.
Ah ha! Something I actually got right on my own :)
Thanks for surfacing that requirement explicitly.

------
Dave Keenan wrote: Sat Jun 27, 2020 9:38 am I think I understand now, why we've been talking past each other on this.
Haha... ummm... I think we're more than not on the same page, I don't think we're even on the same book!

Here's what I was trying to say:

There's a completely different strategy we could take, where we use a tool to automatically find a best fit line to the Scala popularity stats, where the coordinates of the occurrence counts are not 2D (the numerator and denominator of the 5-rough ratios AKA rotational commas) but multidimensional (the prime exponent vector AKA monzo of these notational commas). Because I've found that the free online regression analysis calculator offered by WolframAlpha caps out at 3D data, I am only capable of fitting a line to the 7-limit ratios, because actually the occurrence counts are part of the data's coordinates so they take up one dimension and then the other two dimensions are for the 5-term and the 7-term of the monzos. Theoretically, however, if we could get a mathematical software capable of handling, like, 10-dimensional data, then we could plug in stuff up to the 37-limit, which would probably be plenty to come up with a reasonably good polynomial trendline. In this polynomial trendline's algebraic formulation, we'd have the 5-term be represented by a variable (le'ts say a1), and the 7-term be represented by a variable (let's say a2), and so on. And let's assume that we can get a pretty good fit with a cubic polynomial. So going this route, our candidate metric for improving upon SoPF>3 would look something like :

c1a13 + c2a12 + c3a1 + 
c1a23 + c2a22 + c3a2 + 
c1a33 + c2a32 + c3a3 + 
...

where all those c's are coefficients/constants that the tool comes up with for us.

And my point was that this would be able to account for the difference between 35:1 and 7:5 just fine.

But I think this approach sucks relative to what we've got going otherwise, and I think we can drop it.

------
Dave Keenan wrote: Sat Jun 27, 2020 10:24 am I think we should only "train" our formula on at most the 80 most popular 2,3-reduced ratios.
Works for me.

I'm keeping the rank-1.37 weighting, though. That still feels right.

------

My code has found that the optimal values for k and a are right about 0.6 and 1, respectively. This makes sense. SoPF>3 rates 7/5 as 12 and 11/1 as 11, but 7/5 is actually more popular, with 1318 votes to 11/1's 1002 votes. Setting k to 0.6, or 3/5, transforms the 5 in 7/5 to a 3, causing its rating to improve to 10, so that it beats 11 as it should.

However, at a rating of 10, 7/5 now ties with 25/1's rating of 10. Which reminds me that prime limit should certainly be in the running as part of our candidate metric. Because that'll push 7/5 (7-limit) down in the ranking from 25/1 (5-limit). Including prime limit might also help push 49/1 to a worse rank than 125/1, as it should be, and 13/1 to a worse rank than 49/1, as it should be.

Sum of unique primes might also improve our sum-of-squares. I'll try those next.

Re: developing a notational comma popularity metric

Posted: Sat Jun 27, 2020 11:40 am
by Dave Keenan
Thanks for explaining. So the problem was that I pretty much ignored the word "polynomial" in what you wrote. Sorry.

Of course the full-blown thing you described, with e.g. 3n independent coefficients, where n is the number of primes, suffers from von Neumann's elephant: "With four parameters I can fit [a curve to] an elephant, with five I can make him wiggle his trunk". In other words it would end up "over-trained" to the specific Scala stats, and so would not be useful as a model of the human psychology of pitch ratio selection. In other words it would be unlikely to predict results not directly captured by the Scala stats. I know you understood that.

But if we only go to second degree, and if we have a simple formula that generates the coefficients of the linear terms (e.g. the prime itself, as in sopfr) and another simple formula that generates the coefficients of the squared terms, then this is definitely worth looking at. Thanks. As @volleo6144 mentioned, squaring is much like absolute value, but has the advantage of being continuously differentiable through zero.

Thanks for those figures of k = 0.6, a = 1. I agree we should still be using differences in rank-1.37, unless you get a different generalised-Zipf's-law exponent when you limit to the first 80 ratios.