volleo6144 wrote: ↑Sun Jun 21, 2020 4:55 pm
I did this for the yellow tina candidates (the ones we've mostly agreed on):
I notice the relatively large value 44.155 that the 13:77n got. This appears to be the result of 0.005(13640319 - 13631488). I should have been more explicit about what I meant by n and d: I didn't mean in the actual ratio; I meant in the 2,3-reduced ratio.
volleo6144 wrote:I'm ... pretty sure they meant the full comma ratio: it doesn't exactly make that much sense for 5s to be considered more simple than 7C, and it certainly doesn't make sense for every Pythagorean ratio to be counted as more simple than every other ratio (the full ratio of 12edo3C scores 64.6, and 53edo3s scores 1020.3).
It does seem a bit weird sometimes, but I think in this context we do consider 5s simpler than 7C. Or perhaps it would be better to say we don't distinguish 5s from 5C in this context; we're specifically evaluating the complexity of prime content beyond prime 3. Perhaps Dave would have a more inspiring way of articulating this.
volleo6144 wrote:
I'm, uh, not quite sure about it yet, especially as there's nothing for tina error or anything of the sort.
Yeah, we still have some unanswered questions on the munged front, too. Not quite ready to consolidate a tina badness metric yet.
Dave Keenan wrote: ↑Sun Jun 21, 2020 8:31 pm
Are your n and d, that are used in computing Tenney height (log product complexity), the numerator and denominator of the 2,3-reduced ratio or the full comma ratio? I assume the 2,3-reduced ratio, which is also the simplest ratio that can be notated using the comma.
Yes, that's right.
I vaguely recall hinting on our call that I was interested in incorporating non-2,3-reduced ratios into the metric. But I decided that finding the
correct non-2,3-reduced ratio was going to be more work than it was worth. Multiple non-2,3-reduced ratios are possible for each 5-monzo we work with.
------
Dave Keenan wrote:
If so, I note that log product and SoPF are just differently weighted sums of the absolute values of the prime exponents. Log product weights the exponent of each prime p by a factor of log2(p) while SoPF weights them by a factor of p.
Whoa, is that true? That is true! I had to work out a couple examples to convince myself. Thank you for noticing that.
You have to work from the monzos for this fact to appear; e.g. for 35/1, Tenney height is log
2(5×7) ≈ 5.129 and SoPF>3 is 5 + 7 = 12. But to see that these are different weights on the same metric, we can't take log
2(12) ≈ 3.585; we must take log
27 + log
25 ≈ 5.129. Which is obvious when one reminds themselves of logarithmic identities (like four-dimensional geometry, it is a mathematical space which I still lack intuition for).
Dave Keenan wrote:
So your q + 0.75r is a weighted sum of the absolute values of the prime exponents, where the weights are p + 0.75×log2(p). It's conceivable that some simpler weighting function such as k√̅p̅ or kpa where 0 < a < 1, might do even better.
So: yes, I agree, we should try that. Update: I just tried it. Using a power function, the best fit is near a = 0.9, increasing R
2 to 0.9 (from 0.889 at a = 1, AKA plain ol' SoPF>3). And critically, this does just barely beat out the combination of SoPF>3 and Tenney Height, which maxes out at R
2 of only 0.898 over the same range of popularities. And of course, since it's simpler, we should prefer it. I don't know quite how to write it, though... "SoP
0.9F>3"?
------
I'm also reminded here that I did already briefly experiment with weighting primes by something other than their own value. I haven't gotten too far with it yet.
One pretty cheap approach I took was looking at the popularity of the solo primes (5/1, 7/1, 11/1, etc.), or rather the percentage of their popularity of the total data points. So since 18.27% of pitches used in Scala are 5/1, 5 gets mapped to 1/0.1827 = 5.475. The next few are 7 → 9.749, 11 → 29.345, 13 → 65.781. So you can see these get big pretty fast. But what you actually want to do is use a formula, not a giant table of popularity stats. The best fit line for this data fits really well: 461591x
-2.61 has R
2 of 0.984 through primes up to 47. Using that formula the numbers are slightly different; different enough that it fails almost immediately, considering 25/1 to be less complex than 7/1, since 5 → 4.251 and 7 → 10.229. And besides, feeding these different weights into simple SoPF>3 does
not improve fit; it looks like total noise.
In any case, if we were really going to do something like this, we might want to consider how each prime affects popularity in a more comprehensive sense, considering every ratio it appears in. And we may want to treat the second time a prime appears in the same ratio differently (the 2nd 5 in 25/1 may have a different impact).
Part of me is concerned that when we weight the primes by their empirical popularity and then try to fit some formula involving that weight to said popularity curve, we're doing a weird (in a bad way) feedback loop. But I'm not certain.
And besides, the data above suggests that the primes should be weighted in the
other direction, so that the really high ones don't impact the score as badly. To me, either way feels like it could make sense: amplifying the big primes feels right because at a certain point you might want them to have an infinite impact on the score, since who would promote a, like, 1409-limit prime for anything? Beyond that point, every prime is basically interchangeably infinitely bad. Although muffling the big primes also accomplishes making them interchangeable, just in the other way, flattening them out, and maybe if the weight they flatten out at is bad enough, then they don't have to be infinitely bad, and having a sub-linear trajectory into this zone is the better fit.
------
Dave Keenan wrote:
But I don't buy the story about it being anomalous that there's a vast difference in frequency between p1/p2 and p1×p2 for low primes. I'm more inclined to treat the lack of such a difference for high primes as anomalous.
I felt that way too, but wasn't confident enough about it to suggest it. Glad you said that.
Dave Keenan wrote:
So I think we still need to find a way to capture that. I don't understand why your "t" doesn't do it.
Superficially, of course, it doesn't do it because its coefficient is so low. But yes, "subficially", I thought it might have moved R
2 a lot better than it did.
One thing I had tried to do was fit an approximating line to the data points we have for these pairs of low primes and their differences in popularity depending on whether they are "balanced" on either side of the vinculum or "lopsided" on the same side of the vinculum. Unfortunately since this is now three-dimensional data, I can't use Google Sheets to get the trendline, and therefore I can't get R
2. With Wolfram online's regression analysis tool I can get the trendline:
-0.027904x
2 + 0.279476x - 0.0222386y
2 + 0.971861y - 5.15064
which is quadratic, where x is the lower of the two primes and y is the greater of the two primes, this gives the approximate ratio of the popularity of the balanced version to the lopsided version. It looks like it's not terribly accurate, just eyeballing it. And besides, what exactly are we supposed to do with this?
Maybe the right question to answer is: what weight on t would achieve something closest to these observed popularity changes?
One insight I had is that we have to treat the completely lopsided version as the "home base", if we want to be able to generalize this metric to commas with more than 2 primes (otherwise, how do you defined "balanced"? "Lopsided" is way easier to define).
A prerequisite for this is having an approximate mapping function between popularity and SoPF>3. I know I figured this out the other day, let me dig it up. Okay here it is, given SoPF>3 of x, the popularity should be about 8052.01e
-0.146015x.
35:1 and 7:5 each have SoPF>3 = 12. I need to find a weight on t which will affect that SoPF>3 enough that the popularity changes from 875 to 1318 between the two. Using the above equation, I get that at popularity 875, SoPF>3 should be 15.2002, and at popularity 1318, SoPF>3 should be 12.3947. So we want a change to our SoPF>3 of about 15.2002 - 12.3947 = 2.8055. In this case, t(35/1) = abs(SoPF>3(35) - SoPF>3(1)) = 12 and t(7/5) = abs(SoPF>3(7) - SoPF>3(5)) = 2, so the difference in t is 12 - 2 = 10. So we want our coefficient on t to be about 2.8055/10 = 0.28055.
Now that exercise was only for the first pair of primes. We have six more examples with reasonable popularity changes. Repeating that process for the other pairs, 5 & 11 → 0.71696, 7 & 11 → 0.52403, 5 & 13 → 1.11915, 7 & 13 → 0.77073, 11 & 13 → 0.38307, and 5 & 17 → 1.15495. The average of these is 0.70706. That's over 14x more weight than the best fit line across all the commas was giving us.
------
Dave Keenan wrote:
It seems wrong to include your "u", as all the other terms are logarithmic, or kind of logarithmic, but "u" is linear. So relative to all the other terms, "u" is exponential.
Fair enough. And considering that it accomplishes almost nothing, perhaps that'll be one tonne we can shave off our elephant.
------
So what's next? I could propose an iteration on my previous metric, without the u submetric, with SoPF>3 and Tenney height consolidated into SoP
0.9F>3, and with t weighted at 0.70706 rather than 0.05. But it's lunchtime now and I think I've certainly exceeded my allotted time on the floor this round.