volleo6144 wrote: ↑Fri Jun 26, 2020 11:57 am
cmloegcmluin wrote: ↑Sun Apr 23, 2271752 11:33 pm
Ah ha! I see now. The usage of . as a multiplication symbol is not
intuitive to me.
That thing has an SSL certificate that's expired by about a week...
If you're not comfortable visiting that link, volleo6144, it was an interesting article Dave had shared with me recently that laments a distortion in common usage of the word "intuitive"; it suggests that perhaps most of the time what the person using the word "intuitive" really means is just "familiar". So that's what I was communicating to Dave — there's nothing wrong with him using . for multiplication in general (I'm familiar with that practice from the prime factored forms of comma names in Sagittal such as the 5.13:7.11n, though I thought it was only used in that context because they needed to be clean/compact or something).
------
I have just discovered that phpBB does not seem to let you nest quotes 4-deep. So here's what should be the initial, deepest nested quote:
cmloegcmluin wrote: ↑Fri Jun 26, 2020 6:26 am
I do not currently have access to a powerful hunk of math software such as MATLAB or Mathematica. WolframAlpha's online regression analysis tools seem to be somewhat limited; specifically, they only ever work (for me, anyway) with data in three dimensions or fewer. If we could find a way to do a regression analysis on the prime exponent vectors (monzos, or kets) of these notational ratios along with their popularities, we could find a big 'ol polynomial w/ different coefficients for each prime. Dunno if quadratic would work or if we'd have to go cubic or quartic, but that might do the trick.
And then here's the rest of the replies:
Dave Keenan wrote: ↑Fri Jun 26, 2020 5:58 pm
cmloegcmluin wrote: ↑Fri Jun 26, 2020 9:06 am
Dave Keenan wrote: ↑Fri Jun 26, 2020 8:40 am
I use Excel's Solver when I want to do that kind of thing. But independent weights for each prime still won't give you different ranks for 7/5 and 35/1.
Wouldn't it? In the case of 7/5, the 5-term of the monzo is negative, while in 35/1 it's positive. Couldn't that affect the outcome?
Wouldn't it be the treating of positive exponents differently from negative exponents that made the difference?
Yes, it would be the treating of positive exponents differently from negative exponents that made the difference. What else could you mean by "independent weights for each prime" in the context of a polynomial regression analysis?
------
Dave Keenan wrote: ↑Fri Jun 26, 2020 5:58 pm
The Euclidean distance is where you take all the differences, square them, sum the squares, then take the square root. It's a generalisation to n-dimensions, of Pythagoras' theorem for finding the hypotenuse. Finding the square of the Euclidean distance, simply has the effect of undoing that last step where you took the square root. So I totally agree, it is a confusing term. Better to just not do that step in the first place, and so call it the "sum of squared errors" or "sum of squared differences", often abbreviated to just "sum of squares".
Omigosh. I was actually even more confused than I thought. I would be able to articulate the nature of my confusion, but I don't think it makes a particularly good story, so I'll keep it to myself. Thanks for further explaining.
------
Dave Keenan wrote: ↑Fri Jun 26, 2020 5:58 pm
cmloegcmluin wrote: ↑Fri Jun 26, 2020 9:06 am
I can also try abs(n - d) where n/d is the 5-rough ratio, unless you have some reason to henceforth prefer abs(sopfr(n) - sopfr(d)).
It's that thing I mentioned earlier. Sopfr() is a kind of logarithm. It feels wrong to add numbers to their logarithms. They feel like incommensurate things, like adding pascals (sound pressure) to decibels (log of sound pressure).
I get it. Because in the actual ratios, the primes are multiplied, but now we're summing them. So it's gearing down one level of
hyperoperations.
volleo6144 wrote: ↑Fri Jun 26, 2020 11:24 pm
cmloegcmluin wrote: ↑Fri Jun 26, 2020 4:02 pm
It looks like abs(sopfr(n) - sopfr(d)) maybe does a better job, but I don't see a "heavy [penalty]" for abs(n - d).
If you extend this to the extreme, cases like the
49:9765625n (for the half-tina) have ... issues:
abs(9765625 - 49) = 9765476
abs(sopfr(9765625) - sopfr(49)) = abs(50 - 14) = 36, on par with 1:299
Maybe I didn't emphasize the "heavy penalty" I was talking about enough with the examples.
Alright, let's strike abs(n - d) from the conversation then (and be forgiving of each other if we resurface it... a lot to keep track of here...)
-----
Dave Keenan wrote: ↑Fri Jun 26, 2020 6:16 pm
cmloegcmluin wrote: ↑Fri Jun 26, 2020 4:02 pm
But one issue is that it results in ρ coming out extremely close to 1 in every case, so it's hard to tell whether our metric is truly an improvement. SoPF>3 already has ρ = 0.9999999998222343! That said, k = 1.5 maximizes ρ = 0.9999999998823996 (it's some number near 1.5; I don't know the exact range within which ρ = 0.9999999998823996, but for a decent slice of k around 1.5, the ranks aren't sorting any better or worse).
I wouldn't bother calculating ρ. I'd just look at the sum of squared errors. But I'm curious how you're getting from the sum of squared errors in rank
-1.37, to ρ. I wouldn't have a clue how to normalise that.
Um... it looks like normalize means to make it so that some important value in a system is equal to 1. Which makes sense because in one of my personal projects that's part of what I ended up naming the type for scalars that ranged from 0 to 1.
Well, does it not seem pretty clear from those numbers all being extremely close to 1 that the simplified Spearman's formula we're using is still normalized? I have no idea where the "6" comes from that's in the numerator of the thing subtracted from 1 ohhhhhhh well of course subtracting from 1 is what keeps us super close to 1. Okay. So you're saying that if we change to using rank
-1.37 then we can't use "6" anymore, or something like that?
I guess that could make sense, since after raising the ranks to the -1.37 power they are no longer sequential integers, and it does say in that Wikipedia article that the simplified formula only works when they are sequential integers. So maybe we have to figure out how to use the more complex form of the formula.
------
Dave Keenan wrote: ↑Fri Jun 26, 2020 6:16 pm
I hope you swapped numerators and denominators where required to ensure sopfr(n) ≥ sopfr(d). For example, 25/11 would need to become 11/25 because sopfr(11) = 11 and sopfr(25) = 5+5 = 10. That's what lets us avoid taking absolute values, and lets us use the simplification sopfr(n) + k*sopfr(d).
I did not! That's a very good point to call out. I had certainly recognized that all of the ratios from your popularities spreadsheet were oriented such that n ≥ d, but it did not occur to me that I'd need to account for situations where sopfr(n) could turn out < sopfr(d). So I'll need to add that layer to my code.
------
Dave Keenan wrote: ↑Fri Jun 26, 2020 11:53 pm
Yes, changing
a in sop
afr() does change the ranking.
I swapped n and d as required to ensure sopfr(n) ≥ sopfr(d), then I plotted sop
afr(n) + k*sop
afr(d) against scala_rank(n/d) and adjusted sliders for
k and
a, to maximise the monotonicity of the first 29 ratios by eye. This allowed me to avoid the sorting step. I settled on
k = 0.68 and
a = 1.13. You can play with the sliders yourself in the attached spreadsheet.
In your sheet, it looks like you're not raising the ranks to -1.37, unless I'm missing something. But I guess that's fine if you were just eyeballing things for now.
Interesting that your k is really close to 2/3, the inverse of what I found. Maybe once I fix my code I'll also come up with something close to 2/3. I want it to be related to 3/2 because something feels so magically cool about this value relating the two primes which we have otherwise stricken from the ratios
In any case I think it's a good idea if you do your spreadsheet thing and I do my code thing and we check each other's findings.
Dave Keenan wrote: ↑Fri Jun 26, 2020 11:53 pm
One way to improve the monotonicity further would be to adjust the weightings of the primes independently, as both of you guys have suggested.
If this formula works:
sop
afr(n) + k⋅sop
afr(d)
I would much prefer that to some unwieldy polynomial. Besides, a polynomial would by nature be limited by how many primes we calculated it up through. Whereas this formula generalizes up to any prime limit.
I've got to get to work now, but I'll find time soon to fix my code. And I should be able to code something up that will zero in on the combination of k and a that minimizes the sum of squares.