Magrathean diacritics

Post by **cmloegcmluin** » Fri Jun 19, 2020 11:01 am

Great work, @volleo6144 . Your change to my table makes sense to me.

My guess would be his concern is with the >7 half. Because about a fifth of the commas in Sagittal have abs3exp greater than 7, and by my reckoning the average SoPF>3 for existing Sagittal commas is ≈23, were this metric to be applied to them, it might peg a nontrivial count of them as suboptimal (where we actually consider them to be fine; the metric being flawed, not the commas).

Post by **Dave Keenan** » Fri Jun 19, 2020 6:23 pm

cmloegcmluin wrote: ↑Fri Jun 19, 2020 11:01 am My guess would be [Dave Keenan's] concern is with the >7 half. Because about a fifth of the commas in Sagittal have abs3exp greater than 7, and by my reckoning the average SoPF>3 for existing Sagittal commas is ≈23, were this metric to be applied to them, it might peg a nontrivial count of them as suboptimal (where we actually consider them to be fine; the metric being flawed, not the commas).

Correct.

Post by **cmloegcmluin** » Sun Jun 21, 2020 9:04 am

Okay, I think I've found something that works best:

q + 0.75r + 0.1s + 0.05t + 0.005u

q: Sum of Prime Factors > 3
r: Tenney Height = log₂(n×d)
s: prime limit
t: abs(SoPF>3(n) - SoPF>3(d))
u: abs(n - d)

The coefficients were not fit by code (fit by me fiddling with it manually in a methodical way to find the extrema, but with only 3 decimal points of precision on the R² output from Google Sheets), so it is possible that tiny adjustments could slightly optimize this. However, I think with the smallish size of the data set we're working with, it wouldn't really be respecting sig figs or whatnot were we to press for much more precision here.

Unsurprisingly, better R²'s were found when fitting metrics to the earlier sub-sequences of the comma list sorted by frequency; that's because toward the beginning of the list there are many more data points and thus the data is smoother.

Perhaps more interestingly, weights on these sub-metrics affected the R² in different ways depending on how late into the list your sub-sequence went. The above metrics were found by fitting only to the first 50 entries. If one tries to fit to the first 135 entries, then t and u actually hurt the R² and need their coefficients set to 0, and the prime limit is able to improve the fit by being subtracted, with a coefficient of -0.45, i.e. a higher prime limit suggests a slightly less complex (more popular) comma, which of course seems wrong but happens to work out that way for that stretch of the frequency data when the Tenney Height is also incorporated into the metric (with a whopping 3.5 coefficient). In any case, as you can see, I went with the numbers which optimized R² for the first 50 entries, because those numbers all made intuitive sense to me.

Furthermore, I applied another check at the end. The line was better fit, yes, but did it also involve less zigging and zagging? Indeed it did. While for the first 270 entries in the list, SoPF>3 includes only 149 where the next value is greater than the immediately previous one (values should steadily rise as the commas get more complex), this complexity metric includes 159. In contrast, the version of the complexity metric I found based on the first 135 entries only budged that number from 149 to 150.

R² for the first 135 entries under SoPF>3 was 0.681, and with my complexity heuristic I was able to increase that to 0.767. R² for the first 50 entries under SoPF>3 was 0.904, and under my complexity heuristic I was able to increase that to 0.917.

I forgot to experiment with odd limit in the mix. I'm a bit over this task, though, and I feel reasonably confident that Tenney/Benedetti height is after the same flavor of complexity.

----

One thing that disappoints and confuses me about this complexity heuristic is how little weight it applies to the sub-metrics which are supposed to differentiate 5:7 and 35:1, such as n+d (which as you can see didn't even make the final cut) or abs(n-d). I spent quite a bit of time looking for patterns in the differences in frequency between these pairs of primes when they appeared on the same side of the comma versus opposite sides. The effect is fairly consistent and pronounced:

5:7 = 1318; 35:1 = 875
5:11 = 339; 55:1 = 119
7:11 = 324; 77:1 = 111
5:13 = 205; 65:1 = 40
7:13 = 145; 91:1 = 30
11:13 = 89; 143:1 = 26
5:17 = 108; 85:1 = 20
and so on.

One problem we were trying to solve was that under SoPF>3 these commas get the same complexity rating. Under this metric, at least they are differentiated, but by nowhere near the amount that is reflected in their actual frequency count. What this suggests is that these simple examples are actually exceptional, and that in general the comma frequencies do not conform to such clear 3- to 4- fold differences in occurrence frequency just for balancing the primes across the vinculum.

----

Lemme know what y'all think. I'll run it against the tina candidates if I get an endorsement.

Edit: I should note that I struck 211:11 and 433:125 from the commas I used to calculate these metrics, as they were throwing the numbers off really badly.

volleo6144 · Post by **volleo6144** » Sun Jun 21, 2020 4:55 pm

cmloegcmluin wrote: ↑Sun Jun 21, 2020 9:04 am q + 0.75r + 0.1s + 0.05t + 0.005u

q: Sum of Prime Factors > 3
r: Tenney Height = log₂(n×d)
s: prime limit
t: abs(SoPF>3(n) - SoPF>3(d))
u: abs(n - d)

[...]

Lemme know what y'all think. I'll run it against the tina candidates if I get an endorsement.

I did this for the yellow tina candidates (the ones we've mostly agreed on):

2t = 5831n: 38 + 18.76448 + 1.7 + 1.90 + 0.005 = 60.36948
3t =  455n: 25 + 17.99974 + 1.3 + 1.25 + 0.005 = 45.55474
4t 7:3025n: 39 + 17.34372 + 1.1 + 1.25 + 0.005 = 58.69872
5 25:2401n: 38 + 16.84368 + 0.7 + 0.90 + 0.005 = 55.44868
6t  65:77n: 36 + 16.53303 + 1.3 + 0.00 + 0.005 = 53.83803
--- half of 5s ---
8t  13:77n: 31 + 35.55136 + 1.3 + 0.25 +44.155 = 112.2564
9t    539n: 25 + 25.49922 + 1.1 + 1.25 + 0.475 = 53.32422

14t     5s: 05 + 22.50122 + 0.5 + 0.25 + 0.185 = 28.43622

The "r" term is scaled so that one "unit" of SoPF>3 corresponds to 1600¢ or just under 11,400 tinas of Tenney height, implying that we're willing to trade off a prime 11 for every factor of about 160 we can take off each side of the comma.

The "s" term is ... well, the prime limit divided by 10.

The "t" term is for prime factor balance, but it has about the same importance as the prime limit term—basically nothing.

The "u" term is usually below 1.000, but—for schisminas around the size we're discussing here with ratios of numbers in the millions—it's actually a nontrivial amount, as seen in the 13:77n's abysmal score. This is mostly just a linear, instead of logarithmic, version of "r".

I'm, uh, not quite sure about it yet, especially as there's nothing for tina error or anything of the sort.

Post by **Dave Keenan** » Sun Jun 21, 2020 8:31 pm

Thanks for all your hard work.

Are your n and d, that are used in computing Tenney height (log product complexity), the numerator and denominator of the 2,3-reduced ratio or the full comma ratio? I assume the 2,3-reduced ratio, which is also the simplest ratio that can be notated using the comma. If so, I note that log product and SoPF are just differently weighted sums of the absolute values of the prime exponents. Log product weights the exponent of each prime p by a factor of log₂(p) while SoPF weights them by a factor of p.

So your q + 0.75r is a weighted sum of the absolute values of the prime exponents, where the weights are p + 0.75×log₂(p). It's conceivable that some simpler weighting function such as k√̅p̅ or kp^a where 0 < a < 1, might do even better.

But I don't buy the story about it being anomalous that there's a vast difference in frequency between p₁/p₂ and p₁×p₂ for low primes. I'm more inclined to treat the lack of such a difference for high primes as anomalous.

So I think we still need to find a way to capture that. I don't understand why your "t" doesn't do it.

It seems wrong to include your "u", as all the other terms are logarithmic, or kind of logarithmic, but "u" is linear. So relative to all the other terms, "u" is exponential.

volleo6144 · Post by **volleo6144** » Mon Jun 22, 2020 1:55 am

Dave Keenan wrote: ↑Sun Jun 21, 2020 8:31 pm It seems wrong to include your "u", as all the other terms are logarithmic, or kind of logarithmic, but "u" is linear. So relative to all the other terms, "u" is exponential.

"u" is scaled to one prime 5 for every 1000 difference in the ratio (the 1:1001 "perfect" 71st scores exactly 5.000 on the "u" term), so it being exponential shouldn't affect 2,3-reduced ratios, like, at all. The only things it really discourages are things with terms beyond the thousands on one side only: the 49:5¹⁰n's 2,3-reduced ratio 49:5¹⁰ scores 64+21.6+0.7+1.8+48,827.88 = 48,916.01, but its actual ratio 78,121,827:78,125,000 scores ... much less than 40,000. Conversely, the 19:2401n's actual ratio 1,275,068,416:1,275,989,841 scores 47+45.4+1.9+119.1+4607.125 = 4865.89, while its 2,3-reduced ratio scores ... less than 4000.)

"t" is scaled to one prime 5 for every 100 units in the imbalance of SoPF>3 on either side.
"s" is scaled to one prime 5 for every 50 units in the prime limit.
"r" is scaled to one prime 5 for every 8000¢ in the Tenney height.

I'm ... pretty sure they meant the full comma ratio: it doesn't exactly make that much sense for 5s to be considered more simple than 7C, and it certainly doesn't make sense for every Pythagorean ratio to be counted as more simple than every other ratio (the full ratio of 12edo3C scores 64.6, and 53edo3s scores 10^20.3).

Post by **cmloegcmluin** » Mon Jun 22, 2020 4:50 am

volleo6144 wrote: ↑Sun Jun 21, 2020 4:55 pm I did this for the yellow tina candidates (the ones we've mostly agreed on):

I notice the relatively large value 44.155 that the 13:77n got. This appears to be the result of 0.005(13640319 - 13631488). I should have been more explicit about what I meant by n and d: I didn't mean in the actual ratio; I meant in the 2,3-reduced ratio.

volleo6144 wrote:I'm ... pretty sure they meant the full comma ratio: it doesn't exactly make that much sense for 5s to be considered more simple than 7C, and it certainly doesn't make sense for every Pythagorean ratio to be counted as more simple than every other ratio (the full ratio of 12edo3C scores 64.6, and 53edo3s scores 1020.3).

It does seem a bit weird sometimes, but I think in this context we do consider 5s simpler than 7C. Or perhaps it would be better to say we don't distinguish 5s from 5C in this context; we're specifically evaluating the complexity of prime content beyond prime 3. Perhaps Dave would have a more inspiring way of articulating this.

volleo6144 wrote: I'm, uh, not quite sure about it yet, especially as there's nothing for tina error or anything of the sort.

Yeah, we still have some unanswered questions on the munged front, too. Not quite ready to consolidate a tina badness metric yet.

Dave Keenan wrote: ↑Sun Jun 21, 2020 8:31 pm Are your n and d, that are used in computing Tenney height (log product complexity), the numerator and denominator of the 2,3-reduced ratio or the full comma ratio? I assume the 2,3-reduced ratio, which is also the simplest ratio that can be notated using the comma.

Yes, that's right.

I vaguely recall hinting on our call that I was interested in incorporating non-2,3-reduced ratios into the metric. But I decided that finding the correct non-2,3-reduced ratio was going to be more work than it was worth. Multiple non-2,3-reduced ratios are possible for each 5-monzo we work with.

------

Dave Keenan wrote: If so, I note that log product and SoPF are just differently weighted sums of the absolute values of the prime exponents. Log product weights the exponent of each prime p by a factor of log₂(p) while SoPF weights them by a factor of p.

Whoa, is that true? That is true! I had to work out a couple examples to convince myself. Thank you for noticing that.

You have to work from the monzos for this fact to appear; e.g. for 35/1, Tenney height is log₂(5×7) ≈ 5.129 and SoPF>3 is 5 + 7 = 12. But to see that these are different weights on the same metric, we can't take log₂(12) ≈ 3.585; we must take log₂7 + log₂5 ≈ 5.129. Which is obvious when one reminds themselves of logarithmic identities (like four-dimensional geometry, it is a mathematical space which I still lack intuition for).

Dave Keenan wrote: So your q + 0.75r is a weighted sum of the absolute values of the prime exponents, where the weights are p + 0.75×log₂(p). It's conceivable that some simpler weighting function such as k√̅p̅ or kp^a where 0 < a < 1, might do even better.

So: yes, I agree, we should try that. Update: I just tried it. Using a power function, the best fit is near a = 0.9, increasing R² to 0.9 (from 0.889 at a = 1, AKA plain ol' SoPF>3). And critically, this does just barely beat out the combination of SoPF>3 and Tenney Height, which maxes out at R² of only 0.898 over the same range of popularities. And of course, since it's simpler, we should prefer it. I don't know quite how to write it, though... "SoP^0.9F>3"?

------

I'm also reminded here that I did already briefly experiment with weighting primes by something other than their own value. I haven't gotten too far with it yet.

One pretty cheap approach I took was looking at the popularity of the solo primes (5/1, 7/1, 11/1, etc.), or rather the percentage of their popularity of the total data points. So since 18.27% of pitches used in Scala are 5/1, 5 gets mapped to 1/0.1827 = 5.475. The next few are 7 → 9.749, 11 → 29.345, 13 → 65.781. So you can see these get big pretty fast. But what you actually want to do is use a formula, not a giant table of popularity stats. The best fit line for this data fits really well: 461591x^-2.61 has R² of 0.984 through primes up to 47. Using that formula the numbers are slightly different; different enough that it fails almost immediately, considering 25/1 to be less complex than 7/1, since 5 → 4.251 and 7 → 10.229. And besides, feeding these different weights into simple SoPF>3 does not improve fit; it looks like total noise.

In any case, if we were really going to do something like this, we might want to consider how each prime affects popularity in a more comprehensive sense, considering every ratio it appears in. And we may want to treat the second time a prime appears in the same ratio differently (the 2nd 5 in 25/1 may have a different impact).

Part of me is concerned that when we weight the primes by their empirical popularity and then try to fit some formula involving that weight to said popularity curve, we're doing a weird (in a bad way) feedback loop. But I'm not certain.

And besides, the data above suggests that the primes should be weighted in the other direction, so that the really high ones don't impact the score as badly. To me, either way feels like it could make sense: amplifying the big primes feels right because at a certain point you might want them to have an infinite impact on the score, since who would promote a, like, 1409-limit prime for anything? Beyond that point, every prime is basically interchangeably infinitely bad. Although muffling the big primes also accomplishes making them interchangeable, just in the other way, flattening them out, and maybe if the weight they flatten out at is bad enough, then they don't have to be infinitely bad, and having a sub-linear trajectory into this zone is the better fit.

------

Dave Keenan wrote: But I don't buy the story about it being anomalous that there's a vast difference in frequency between p₁/p₂ and p₁×p₂ for low primes. I'm more inclined to treat the lack of such a difference for high primes as anomalous.

I felt that way too, but wasn't confident enough about it to suggest it. Glad you said that.

Dave Keenan wrote: So I think we still need to find a way to capture that. I don't understand why your "t" doesn't do it.

Superficially, of course, it doesn't do it because its coefficient is so low. But yes, "subficially", I thought it might have moved R² a lot better than it did.

One thing I had tried to do was fit an approximating line to the data points we have for these pairs of low primes and their differences in popularity depending on whether they are "balanced" on either side of the vinculum or "lopsided" on the same side of the vinculum. Unfortunately since this is now three-dimensional data, I can't use Google Sheets to get the trendline, and therefore I can't get R². With Wolfram online's regression analysis tool I can get the trendline:

-0.027904x² + 0.279476x - 0.0222386y² + 0.971861y - 5.15064

which is quadratic, where x is the lower of the two primes and y is the greater of the two primes, this gives the approximate ratio of the popularity of the balanced version to the lopsided version. It looks like it's not terribly accurate, just eyeballing it. And besides, what exactly are we supposed to do with this?

Maybe the right question to answer is: what weight on t would achieve something closest to these observed popularity changes?

One insight I had is that we have to treat the completely lopsided version as the "home base", if we want to be able to generalize this metric to commas with more than 2 primes (otherwise, how do you defined "balanced"? "Lopsided" is way easier to define).

A prerequisite for this is having an approximate mapping function between popularity and SoPF>3. I know I figured this out the other day, let me dig it up. Okay here it is, given SoPF>3 of x, the popularity should be about 8052.01e^-0.146015x.

35:1 and 7:5 each have SoPF>3 = 12. I need to find a weight on t which will affect that SoPF>3 enough that the popularity changes from 875 to 1318 between the two. Using the above equation, I get that at popularity 875, SoPF>3 should be 15.2002, and at popularity 1318, SoPF>3 should be 12.3947. So we want a change to our SoPF>3 of about 15.2002 - 12.3947 = 2.8055. In this case, t(35/1) = abs(SoPF>3(35) - SoPF>3(1)) = 12 and t(7/5) = abs(SoPF>3(7) - SoPF>3(5)) = 2, so the difference in t is 12 - 2 = 10. So we want our coefficient on t to be about 2.8055/10 = 0.28055.

Now that exercise was only for the first pair of primes. We have six more examples with reasonable popularity changes. Repeating that process for the other pairs, 5 & 11 → 0.71696, 7 & 11 → 0.52403, 5 & 13 → 1.11915, 7 & 13 → 0.77073, 11 & 13 → 0.38307, and 5 & 17 → 1.15495. The average of these is 0.70706. That's over 14x more weight than the best fit line across all the commas was giving us.

------

Dave Keenan wrote: It seems wrong to include your "u", as all the other terms are logarithmic, or kind of logarithmic, but "u" is linear. So relative to all the other terms, "u" is exponential.

Fair enough. And considering that it accomplishes almost nothing, perhaps that'll be one tonne we can shave off our elephant.

------

So what's next? I could propose an iteration on my previous metric, without the u submetric, with SoPF>3 and Tenney height consolidated into SoP^0.9F>3, and with t weighted at 0.70706 rather than 0.05. But it's lunchtime now and I think I've certainly exceeded my allotted time on the floor this round.

Post by **cmloegcmluin** » Mon Jun 22, 2020 5:59 am

I feel like it could be helpful if the thread in this topic where we're developing a complexity metric as an improvement to SoPF>3 was broken out into a separate topic. But I'm not readily finding a nice seam to break posts off at. We've still got quite a bit of discussion woven in here about Magrathean-specific stuff (the munged tina error, etc.). What do others think?

volleo6144 · Post by **volleo6144** » Mon Jun 22, 2020 10:15 am

cmloegcmluin wrote: ↑Mon Jun 22, 2020 4:50 am
volleo6144 wrote: ↑Sun Jun 21, 2020 4:55 pm I did this for the yellow tina candidates (the ones we've mostly agreed on):
I notice the relatively large value 44.155 that the 13:77n got. This appears to be the result of 0.005(13640319 - 13631488). I should have been more explicit about what I meant by n and d: I didn't mean in the actual ratio; I meant in the 2,3-reduced ratio.

Yeah, I guess.
...I did the yellow tinas again, with the 2,3-reduced ratios, and N2D3P9:

2t = 5831n: 38 + 9.382146 + 1.7 + 1.90 + 29.15 = 80.13215 (N2D3P9 = 688.3819)
3t =  455n: 25 + 6.622292 + 1.3 + 1.25 +  2.27 = 36.44229 (N2D3P9 = 82.15278)
4t 7:3025n: 39 + 10.77756 + 1.1 + 1.25 + 15.09 = 67.21756 (N2D3P9 = 537.1782)
5 25:2401n: 38 + 11.90496 + 0.7 + 0.90 + 11.88 = 63.38497 (N2D3P9 = 324.2091)
6t  65:77n: 36 + 9.216866 + 1.3 + 0.00 +  0.06 = 46.57687 (N2D3P9 = 200.8179)
--- half of 5s ---
8t  13:77n: 31 + 7.475420 + 1.3 + 0.25 +  0.32 = 40.34542 (N2D3P9 = 120.4907)
9t    539n: 25 + 6.805606 + 1.1 + 1.25 +  2.69 = 36.84561 (N2D3P9 = 82.34722)

14t     5s:  5 + 1.741446 + 0.5 + 0.25 +  0.02 = 7.511446 (N2D3P9 = 1.388889)

The "u" term heavily penalizes commas with many primes on one side but only one or two, or zero, on the other, such as our yellow 2 [5831:5832], 4 [3024:3025], and 5 [2400:2401] tinas.

On an unrelated note, is there a reason we don't we have the tina-diacritics (and the 3-mina) as smileys yet? Is it because the schisminas they represent aren't completely decided yet? ...Or is it that (as far as I know) Sagispeak has nothing for them yet, so the filenames would be undecided?

Post by **Dave Keenan** » Mon Jun 22, 2020 8:39 pm

The SoPF>3 improvement should be a new topic. But I see the problem. Perhaps a new topic to summarise the final result.

volleo6144 wrote: ↑Mon Jun 22, 2020 10:15 am The "u" term heavily penalizes commas with many primes on one side but only one or two, or zero, on the other, such as our yellow 2 [5831:5832], 4 [3024:3025], and 5 [2400:2401] tinas.

So "u" is the only thing differentiating 7/5 from 35 etc, but not very well. I'd like to understand why "t" isn't doing that. But maybe if you take the log of "u" it will work even better.

On an unrelated note, is there a reason we don't we have the tina-diacritics (and the 3-mina) as smileys yet? Is it because the schisminas they represent aren't completely decided yet? ...Or is it that (as far as I know) Sagispeak has nothing for them yet, so the filenames would be undecided?

It's really just because it's a higher priority to get the outline-font glyphs (whose paint is barely dry, figuratively speaking) added to SMuFL and Bravura, and for that we need the schisminas.

I don't think Sagispeak will ever have anything for them, except that mimimi and momomo are obvious extensions for 9-tinas/3-minas.

So I think the filenames for the tina images can just be tinaUp1.png, tinaUp2.png, tinaDown9.png, etc. And maybe the non-committal tinaUpFraction.png and tinaDownFraction.png for the dots.

If you want to, feel free to prepare the .png files, based on the smaller size in this document:

There are some instructions here from an earlier smilies effort: viewtopic.php?p=579#instructions

The Sagittal forum

Magrathean diacritics

Re: Magrathean diacritics

Re: Magrathean diacritics

Re: Magrathean diacritics

Re: Magrathean diacritics

Re: Magrathean diacritics

Re: Magrathean diacritics

Re: Magrathean diacritics

Re: Magrathean diacritics

Re: Magrathean diacritics

Re: Magrathean diacritics