In terms of bang-for-your-buck, SoPF>3 is astoundingly good: simple enough that you can often compute it in your head by just looking at it, and has quite strong correlation with Scala's popularity statistics. However it fails at certain things, such as differentiating the popularity of 35:1 from 7:5 (lopsided vs. balanced), and the popularity of 17.5:1 from 11.11:1 (considering prime limit), both of manifest as pronouncedly different in the Scala statistics. And so in this improvement upon SoPF>3 we seek to optimize for correlation with microtonal compositional practice (with Scala's stats as our best available measure of that) over simplicity of its calculation.

Since this search for an improved notational (i.e., deviations from nominals of a Pythagorean chain of fifths, plus sharps and flats) comma popularity metric took on a life of its own, and the Magrathean diacritics topic is chock-full of other subtopics (many of which are confusingly similar to this subtopic), and this metric could be useful in other circumstances (even beyond Sagittal! but actually beginning with the topic about making the Extreme level of the JI Notation 37-limit, which as it turns out was where the SoPF>3 improvement was first requested, before it was imported to the Magrathean diacritics topic (oh and also the 121k vs 1225k issue)) I decided to start this new topic here so we can focus on this effort independently.

I will now attempt to reproduce just the relevant bits from the Magrathean diacritics topic, to get this topic started:

Dave Keenan wrote: ↑Thu Jun 04, 2020 12:39 pm I'm not here. But @volleo6144 is right, that 499S is not a typo. Ratios of 499 weirdly occur a few times in the Scala archive. I think that George and I held on too long to the idea that we should give priority to commas that occur more often in the Scala archive. In hindsight, once you get beyond the 100-or-so most common 2,3-reduced ratios (maybe only the first 40-or-so) then the numbers of occurrences are so small as to be due to historical accidents that are unlikely to be predictive of future use.

Part of the reason we held on so long, is that in the early days, in order for the Sagittal idea to survive politically, we had to be seen to be basing the comma choices on something objective like the Scala archive stats, not some "arbitrary complexity function" that, it would be argued, merely suited our biases. The only reason SoPF>3 hadanyrespectability, at least in the minds of George and I, is that it gives a ranking that is similar to the Scala archive ranking, for those first 40-or-so 2,3-reduced ratios. And it is easy to calculate mentally.

But perhaps the time has come, to try to find a better complexity function for our purposes. One that matches the Scala archive stats even better, but filters out the historical noise. And it need not be easily mentally calculated. For starters, the weighting of each prime need not be the prime itself.

But more importantly, one area where SoPF>3 has always fallen down, is that it ignores the relative signs of the different prime exponents. e.g. it treats 5:7 as having the same rank as 1:35, and 5:11 the same as 1:55, and 7:11 same as 1:77, which they are not. The ratios where the primes are on opposite sides are more common.

Even more complicated: In order of decreasing frequency of occurrence, we have the ratios 11:35, 7:55. 5:77 and 1:385, which are all treated equally by SoPF>3.

I challenge readers to come up with a simple function that treats them differently, and whose coefficients can be fitted to the Scala stats. But beware Von Neumann's elephant (second-last paragraph).

Compare:

viewtopic.php?p=259#p259

and

viewtopic.php?f=4&t=99#rank

Dave Keenan wrote: ↑Fri Jun 05, 2020 12:54 pm If we come up with an improvement over SoPF>3, as a measure of the complexity of the set of 2,3-equivalent ratios that can be notated by a symbol for a given comma, as requested here: viewtopic.php?p=1676#p1676, then we would presumably also want to substitute it for SoPF>3 in any badness measure we might use in this thread. This is a specific kind of complexity measure — one that we have designed to be strongly (inversely) correlated with popularity.

A "badness" measure is typically a combination of a complexity measure and an error measure. But we have two kinds of complexity measure in this case. The other kind is the complexity of the resulting notation, in the sense of how many sharps or flats may need to be used to cancel out (or mostly cancel out) the 3-exponent of the comma. Or on a finer scale, how many fifths the resulting sharped or flatted letter name is from 1/1 along the chain of fifths. The Revo (pure) notation can provide at most two sharps or flats, which corresponds to a change of 14 in the absolute value of the 3-exponent. And it has the additional limitation that positive comma alterations cannot be applied to a double-sharp and negative cannot be applied to a double-flat.

cmloegcmluin wrote: ↑Tue Jun 16, 2020 11:14 am Not ready with results yet, but I wanted to drop a line on this thread to say: I've been experimenting a bunch in the last couple of days and I have a technique now which may prove fruitful.

I mentioned to @Dave Keenan yesterday that my partner does Product Marketing for a living, a deeply data-driven occupation. I regularly turn to her for answers on stats related problems. So I asked her this afternoon if she knew a better way to do regression analyses than manually copying and pasting data series into Wolfram online, because I was going to need to start testing a ton of variations of metric combinations. She said: just use Google Sheets! Whatever I say about Sheets is probably also true for Excel.

Indeed they have various formulas, e.g. SLOPE for linear fit and GROWTH for exponential fit. But those turned out to only be good for predicting future values in a series. I needed something to give me the actual formula for the best fit curve.

Google Sheets did in the end have the answer, but not in a formula. The solution was found inside their Charts feature. If you give it a data series and Customize the chart, one of the options it provides is a Trendline. Enabling the Trendline gives you a bunch of options: linear, exponential, logarithmic, power, etc. You can generally eyeball which is the best shape for your data, but an objective measure is found in the R^{2}value, or coefficient of determination. It can go as high as 1, or 100%.

And f you change the Label of the Trendline in the dropdown to "Equation" then you can get its equation. But what I ultimately needed was the goodness-of-fit; I was only after the equation as a means to calculating goodness-of-fit myself. So that Sheets calculated R^{2}for me was even better than I was expecting!

So anyway, my next steps will be to come up with a ton of different combinations of metrics (SoPF>3, Benedetti height, Tenney height, n+d ("length"?), abs(n-d), abs(SoPF>3(n) - SoPF>3(d)), etc. etc. etc.) and then just compare all of their R^{2}and see which one has the best fit with respect to the frequency statistics from Scala.

By the way, the R^{2}for the frequency statistics themselves is an impressive 0.991 when fit to the equation 8041x^{-1.37}, where x is the index of the comma in the list of commas sorted by descending frequency. Dunno if there's any significance to that coefficient, but there ya go.

So I guess the moral of the story is: trust your partner for assists, and the solution is often right under your nose.

Dave Keenan wrote: ↑Wed Jun 17, 2020 1:12 pm I look forward to further results from this approach.

It relates to my observation that the frequency falls off faster than Zipf's law. Zipf's law implies kx⁻¹.cmloegcmluin wrote: ↑Tue Jun 16, 2020 11:14 am By the way, the R² for the frequency statistics themselves is an impressive 0.991 when fit to the equation 8041x⁻¹⋅³⁷, where x is the index of the comma in the list of commas sorted by descending frequency. Dunno if there's any significance to that coefficient, but there ya go.

But I note that we don't really care if the complexity function is a good fit to the frequency, only that it produces (nearly) the same rank ordering.

cmloegcmluin wrote: ↑Wed Jun 17, 2020 2:45 pmAh! Yes, that's an excellent observation.Dave Keenan wrote: ↑Wed Jun 17, 2020 1:12 pm It relates to my observation that the frequency falls off faster than Zipf's law. Zipf's law implies kx⁻¹.

Right. We're not trying to predict further (appended or interpolated) entries in the frequency list. I submitted this 0.991 value as an indicator that we have enough data in the Scala stats such that they come out pretty smooth. And it's also a measure that people use these commas in their scales in a remarkably predictable way (I was vaguely wondering whether there might be some deeper mathematical/harmonic meaning to this 1.37 number).But I note that we don't really care if the complexity function is a good fit to the frequency, only that it produces (nearly) the same rank ordering.

Dave Keenan wrote: ↑Fri Jun 19, 2020 12:08 amYes. That's the page I'm on too. But I guess I'm making the additional assumption that whatever alternative complexity measure we may come up with will not be too different from the current SoPF>3. You need not conform to that. It's just that in munging the error and 3-exponent I'm attempting to put them in a form that is in some way comparable to SoPF>3 so that it makes sense to add them to it, to obtain an overall badness.cmloegcmluin wrote: ↑Thu Jun 18, 2020 7:31 am And just to make sure we're on the same page (and this articulation is as much for my own benefit, to clarify what the heck is happening here, haha): the investigations I described my plans for above are focused on producing an improvement to the SoPF>3 metric – a better frequency heuristic – and they may likely take the closely-related prime limit property of a comma into consideration. It is a parallel task to this development of munged tina error and abs3exp metrics.

cmloegcmluin wrote: ↑Fri Jun 19, 2020 2:23 am I believe it will be quite close to SoPF>3, if judging only from the fact that in this list linked earlier, other factors besides SoPF>3 such as prime limit and vincular(?) balance of the primes accounted only for sorting within a SoPF>3 tier, or in other words, their effect on the metric would only ever budge it by less than 1.

cmloegcmluin wrote: ↑Sun Jun 21, 2020 9:04 am Okay, I think I've found something that works best:

q + 0.75r + 0.1s + 0.05t + 0.005u

q: Sum of Prime Factors > 3

r: Tenney Height = log_{2}(n×d)

s: prime limit

t: abs(SoPF>3(n) - SoPF>3(d))

u: abs(n - d)

The coefficients were not fit by code (fit by me fiddling with it manually in a methodical way to find the extrema, but with only 3 decimal points of precision on the R^{2}output from Google Sheets), so it is possible that tiny adjustments could slightly optimize this. However, I think with the smallish size of the data set we're working with, it wouldn't really be respecting sig figs or whatnot were we to press for much more precision here.

Unsurprisingly, better R^{2}'s were found when fitting metrics to the earlier sub-sequences of the comma list sorted by frequency; that's because toward the beginning of the list there are many more data points and thus the data is smoother.

Perhaps more interestingly, weights on these sub-metrics affected the R^{2}in different ways depending on how late into the list your sub-sequence went. The above metrics were found by fitting only to the first 50 entries. If one tries to fit to the first 135 entries, then t and u actually hurt the R^{2}and need their coefficients set to 0, and the prime limit is able to improve the fit by beingsubtracted, with a coefficient of -0.45, i.e. a higher prime limit suggests a slightly less complex (more popular) comma, which of course seems wrong but happens to work out that way for that stretch of the frequency data when the Tenney Height is also incorporated into the metric (with a whopping 3.5 coefficient). In any case, as you can see, I went with the numbers which optimized R^{2}for the first 50 entries, because those numbers all made intuitive sense to me.

Furthermore, I applied another check at the end. The line was better fit, yes, but did it also involve less zigging and zagging? Indeed it did. While for the first 270 entries in the list, SoPF>3 includes only 149 where the next value is greater than the immediately previous one (values should steadily rise as the commas get more complex), this complexity metric includes 159. In contrast, the version of the complexity metric I found based on the first 135 entries only budged that number from 149 to 150.

R^{2}for the first 135 entries under SoPF>3 was 0.681, and with my complexity heuristic I was able to increase that to 0.767. R^{2}for the first 50 entries under SoPF>3 was 0.904, and under my complexity heuristic I was able to increase that to 0.917.

I forgot to experiment with odd limit in the mix. I'm a bit over this task, though, and I feel reasonably confident that Tenney/Benedetti height is after the same flavor of complexity.

----

One thing that disappoints and confuses me about this complexity heuristic is how little weight it applies to the sub-metrics which are supposed to differentiate 5:7 and 35:1, such as n+d (which as you can see didn't even make the final cut) or abs(n-d). I spent quite a bit of time looking for patterns in the differences in frequency between these pairs of primes when they appeared on the same side of the comma versus opposite sides. The effect is fairly consistent and pronounced:

5:7 = 1318; 35:1 = 875

5:11 = 339; 55:1 = 119

7:11 = 324; 77:1 = 111

5:13 = 205; 65:1 = 40

7:13 = 145; 91:1 = 30

11:13 = 89; 143:1 = 26

5:17 = 108; 85:1 = 20

and so on.

One problem we were trying to solve was that under SoPF>3 these commas get the same complexity rating. Under this metric, at least they are differentiated, but by nowhere near the amount that is reflected in their actual frequency count. What this suggests is that these simple examples are actually exceptional, and that in general the comma frequencies do not conform to such clear 3- to 4- fold differences in occurrence frequency just for balancing the primes across the vinculum.

----

Lemme know what y'all think. I'll run it against the tina candidates if I get an endorsement.

Edit: I should note that I struck 211:11 and 433:125 from the commas I used to calculate these metrics, as they were throwing the numbers off really badly.

volleo6144 wrote: ↑Sun Jun 21, 2020 4:55 pmI did this for the yellow tina candidates (the ones we've mostly agreed on):cmloegcmluin wrote: ↑Sun Jun 21, 2020 9:04 am q + 0.75r + 0.1s + 0.05t + 0.005u

q: Sum of Prime Factors > 3

r: Tenney Height = log_{2}(n×d)

s: prime limit

t: abs(SoPF>3(n) - SoPF>3(d))

u: abs(n - d)

[...]

Lemme know what y'all think. I'll run it against the tina candidates if I get an endorsement.

2t = 5831n: 38 + 18.76448 + 1.7 + 1.90 + 0.005 = 60.36948 3t = 455n: 25 + 17.99974 + 1.3 + 1.25 + 0.005 = 45.55474 4t 7:3025n: 39 + 17.34372 + 1.1 + 1.25 + 0.005 = 58.69872 5 25:2401n: 38 + 16.84368 + 0.7 + 0.90 + 0.005 = 55.44868 6t 65:77n: 36 + 16.53303 + 1.3 + 0.00 + 0.005 = 53.83803 --- half of 5s --- 8t 13:77n: 31 + 35.55136 + 1.3 + 0.25 +44.155 = 112.2564 9t 539n: 25 + 25.49922 + 1.1 + 1.25 + 0.475 = 53.32422 14t 5s: 05 + 22.50122 + 0.5 + 0.25 + 0.185 = 28.43622The "r" term is scaled so that one "unit" of SoPF>3 corresponds to 1600¢ or just under 11,400 tinas of Tenney height, implying that we're willing to trade off a prime 11 for every factor of about 160 we can take off each side of the comma.

The "s" term is ... well, the prime limit divided by 10.

The "t" term is for prime factor balance, but it has about the same importance as the prime limit term—basically nothing.

The "u" term is usually below 1.000, but—for schisminas around the size we're discussing here with ratios of numbers in the millions—it's actually a nontrivial amount, as seen in the 13:77n's abysmal score. This is mostly just a linear, instead of logarithmic, version of "r".

I'm, uh, not quite sure about it yet, especially as there's nothing for tina error or anything of the sort.

Dave Keenan wrote: ↑Sun Jun 21, 2020 8:31 pm Thanks for all your hard work.

Are your n and d, that are used in computing Tenney height (log product complexity), the numerator and denominator of the 2,3-reduced ratio or the full comma ratio? I assume the 2,3-reduced ratio, which is also the simplest ratio that can be notated using the comma. If so, I note that log product and SoPF are just differently weighted sums of the absolute values of the prime exponents. Log product weights the exponent of each primepby a factor of log_{2}(p) while SoPF weights them by a factor ofp.

So your q + 0.75r is a weighted sum of the absolute values of the prime exponents, where the weights arep+ 0.75×log_{2}(p). It's conceivable that some simpler weighting function such as k√̅p̅or kpwhere 0 <^{a}a< 1, might do even better.

But I don't buy the story about it being anomalous that there's a vast difference in frequency betweenp_{1}/p_{2}andp_{1}×p_{2}for low primes. I'm more inclined to treat the lack of such a difference for high primes as anomalous.

So I think we still need to find a way to capture that. I don't understand why your "t" doesn't do it.

It seems wrong to include your "u", as all the other terms are logarithmic, or kind of logarithmic, but "u" is linear. So relative to all the other terms, "u" is exponential.

volleo6144 wrote: ↑Mon Jun 22, 2020 1:55 am"u" is scaled to one prime 5 for every 1000 difference in the ratio (the 1:1001 "perfect" 71st scores exactly 5.000 on the "u" term), so it being exponential shouldn't affect 2,3-reduced ratios, like, at all. The only things it really discourages are things with terms beyond the thousands on one side only: the 49:5Dave Keenan wrote: ↑Sun Jun 21, 2020 8:31 pm It seems wrong to include your "u", as all the other terms are logarithmic, or kind of logarithmic, but "u" is linear. So relative to all the other terms, "u" is exponential.^{10}n's 2,3-reduced ratio 49:5^{10}scores 64+21.6+0.7+1.8+48,827.88 = 48,916.01, but its actual ratio 78,121,827:78,125,000 scores ... much less than 40,000. Conversely, the 19:2401n's actual ratio 1,275,068,416:1,275,989,841 scores 47+45.4+1.9+119.1+4607.125 = 4865.89, while its 2,3-reduced ratio scores ... less than 4000.)

"t" is scaled to one prime 5 for every 100 units in the imbalance of SoPF>3 on either side.

"s" is scaled to one prime 5 for every 50 units in the prime limit.

"r" is scaled to one prime 5 for every 8000¢ in the Tenney height.

I'm ... pretty sure they meant the full comma ratio: it doesn't exactly make that much sense for 5s to be considered more simple than 7C, and it certainly doesn't make sense for every Pythagorean ratio to be counted as more simple than every other ratio (the full ratio of 12edo3C scores 64.6, and 53edo3s scores 10^{20.3}).

cmloegcmluin wrote: ↑Mon Jun 22, 2020 4:50 amI notice the relatively large value 44.155 that the 13:77n got. This appears to be the result of 0.005(13640319 - 13631488). I should have been more explicit about what I meant by n and d: I didn't mean in the actual ratio; I meant in the 2,3-reduced ratio.volleo6144 wrote: ↑Sun Jun 21, 2020 4:55 pm I did this for the yellow tina candidates (the ones we've mostly agreed on):

It does seem a bit weird sometimes, but I think in this context we do consider 5s simpler than 7C. Or perhaps it would be better to say we don't distinguish 5s from 5C in this context; we're specifically evaluating the complexity of prime content beyond prime 3. Perhaps Dave would have a more inspiring way of articulating this.volleo6144 wrote:I'm ... pretty sure they meant the full comma ratio: it doesn't exactly make that much sense for 5s to be considered more simple than 7C, and it certainly doesn't make sense for every Pythagorean ratio to be counted as more simple than every other ratio (the full ratio of 12edo3C scores 64.6, and 53edo3s scores 1020.3).

Yeah, we still have some unanswered questions on the munged front, too. Not quite ready to consolidate a tina badness metric yet.volleo6144 wrote: I'm, uh, not quite sure about it yet, especially as there's nothing for tina error or anything of the sort.

Yes, that's right.Dave Keenan wrote: ↑Sun Jun 21, 2020 8:31 pm Are your n and d, that are used in computing Tenney height (log product complexity), the numerator and denominator of the 2,3-reduced ratio or the full comma ratio? I assume the 2,3-reduced ratio, which is also the simplest ratio that can be notated using the comma.

I vaguely recall hinting on our call that I was interested in incorporating non-2,3-reduced ratios into the metric. But I decided that finding thecorrectnon-2,3-reduced ratio was going to be more work than it was worth. Multiple non-2,3-reduced ratios are possible for each 5-monzo we work with.

------

Whoa, is that true? That is true! I had to work out a couple examples to convince myself. Thank you for noticing that.Dave Keenan wrote: If so, I note that log product and SoPF are just differently weighted sums of the absolute values of the prime exponents. Log product weights the exponent of each primepby a factor of log_{2}(p) while SoPF weights them by a factor ofp.

You have to work from the monzos for this fact to appear; e.g. for 35/1, Tenney height is log_{2}(5×7) ≈ 5.129 and SoPF>3 is 5 + 7 = 12. But to see that these are different weights on the same metric, we can't take log_{2}(12) ≈ 3.585; we must take log_{2}7 + log_{2}5 ≈ 5.129. Which is obvious when one reminds themselves of logarithmic identities (like four-dimensional geometry, it is a mathematical space which I still lack intuition for).

So: yes, I agree, we should try that. Update: I just tried it. Using a power function, the best fit is near a = 0.9, increasing RDave Keenan wrote: So your q + 0.75r is a weighted sum of the absolute values of the prime exponents, where the weights arep+ 0.75×log_{2}(p). It's conceivable that some simpler weighting function such as k√̅p̅or kpwhere 0 <^{a}a< 1, might do even better.^{2}to 0.9 (from 0.889 at a = 1, AKA plain ol' SoPF>3). And critically, this does just barely beat out the combination of SoPF>3 and Tenney Height, which maxes out at R^{2}of only 0.898 over the same range of popularities. And of course, since it's simpler, we should prefer it. I don't know quite how to write it, though... "SoP^{0.9}F>3"?

------

I'm also reminded here that I did already briefly experiment with weighting primes by something other than their own value. I haven't gotten too far with it yet.

One pretty cheap approach I took was looking at the popularity of the solo primes (5/1, 7/1, 11/1, etc.), or rather the percentage of their popularity of the total data points. So since 18.27% of pitches used in Scala are 5/1, 5 gets mapped to 1/0.1827 = 5.475. The next few are 7 → 9.749, 11 → 29.345, 13 → 65.781. So you can see these get big pretty fast. But what you actually want to do is use a formula, not a giant table of popularity stats. The best fit line for this data fits really well: 461591x^{-2.61}has R^{2}of 0.984 through primes up to 47. Using that formula the numbers are slightly different; different enough that it fails almost immediately, considering 25/1 to be less complex than 7/1, since 5 → 4.251 and 7 → 10.229. And besides, feeding these different weights into simple SoPF>3 doesnotimprove fit; it looks like total noise.

In any case, if we were really going to do something like this, we might want to consider how each prime affects popularity in a more comprehensive sense, considering every ratio it appears in. And we may want to treat the second time a prime appears in the same ratio differently (the 2nd 5 in 25/1 may have a different impact).

Part of me is concerned that when we weight the primes by their empirical popularity and then try to fit some formula involving that weight to said popularity curve, we're doing a weird (in a bad way) feedback loop. But I'm not certain.

And besides, the data above suggests that the primes should be weighted in theotherdirection, so that the really high ones don't impact the score as badly. To me, either way feels like it could make sense: amplifying the big primes feels right because at a certain point you might want them to have an infinite impact on the score, since who would promote a, like, 1409-limit prime for anything? Beyond that point, every prime is basically interchangeably infinitely bad. Although muffling the big primes also accomplishes making them interchangeable, just in the other way, flattening them out, and maybe if the weight they flatten out at is bad enough, then they don't have to be infinitely bad, and having a sub-linear trajectory into this zone is the better fit.

------

I felt that way too, but wasn't confident enough about it to suggest it. Glad you said that.Dave Keenan wrote: But I don't buy the story about it being anomalous that there's a vast difference in frequency betweenp_{1}/p_{2}andp_{1}×p_{2}for low primes. I'm more inclined to treat the lack of such a difference for high primes as anomalous.

Superficially, of course, it doesn't do it because its coefficient is so low. But yes, "subficially", I thought it might have moved RDave Keenan wrote: So I think we still need to find a way to capture that. I don't understand why your "t" doesn't do it.^{2}a lot better than it did.

One thing I had tried to do was fit an approximating line to the data points we have for these pairs of low primes and their differences in popularity depending on whether they are "balanced" on either side of the vinculum or "lopsided" on the same side of the vinculum. Unfortunately since this is now three-dimensional data, I can't use Google Sheets to get the trendline, and therefore I can't get R^{2}. With Wolfram online's regression analysis tool I can get the trendline:

-0.027904x^{2}+ 0.279476x - 0.0222386y^{2}+ 0.971861y - 5.15064

which is quadratic, where x is the lower of the two primes and y is the greater of the two primes, this gives the approximate ratio of the popularity of the balanced version to the lopsided version. It looks like it's not terribly accurate, just eyeballing it. And besides, what exactly are we supposed to do with this?

Maybe the right question to answer is: what weight on t would achieve something closest to these observed popularity changes?

One insight I had is that we have to treat the completely lopsided version as the "home base", if we want to be able to generalize this metric to commas with more than 2 primes (otherwise, how do you defined "balanced"? "Lopsided" is way easier to define).

A prerequisite for this is having an approximate mapping function between popularity and SoPF>3. I know I figured this out the other day, let me dig it up. Okay here it is, given SoPF>3 of x, the popularity should be about 8052.01e^{-0.146015x}.

35:1 and 7:5 each have SoPF>3 = 12. I need to find a weight on t which will affect that SoPF>3 enough that the popularity changes from 875 to 1318 between the two. Using the above equation, I get that at popularity 875, SoPF>3 should be 15.2002, and at popularity 1318, SoPF>3 should be 12.3947. So we want a change to our SoPF>3 of about 15.2002 - 12.3947 = 2.8055. In this case, t(35/1) = abs(SoPF>3(35) - SoPF>3(1)) = 12 and t(7/5) = abs(SoPF>3(7) - SoPF>3(5)) = 2, so the difference in t is 12 - 2 = 10. So we want our coefficient on t to be about 2.8055/10 = 0.28055.

Now that exercise was only for the first pair of primes. We have six more examples with reasonable popularity changes. Repeating that process for the other pairs, 5 & 11 → 0.71696, 7 & 11 → 0.52403, 5 & 13 → 1.11915, 7 & 13 → 0.77073, 11 & 13 → 0.38307, and 5 & 17 → 1.15495. The average of these is 0.70706. That's over 14x more weight than the best fit line across all the commas was giving us.

------

Fair enough. And considering that it accomplishes almost nothing, perhaps that'll be one tonne we can shave off our elephant.Dave Keenan wrote: It seems wrong to include your "u", as all the other terms are logarithmic, or kind of logarithmic, but "u" is linear. So relative to all the other terms, "u" is exponential.

------

So what's next? I could propose an iteration on my previous metric, without the u submetric, with SoPF>3 and Tenney height consolidated into SoP^{0.9}F>3, and with t weighted at 0.70706 rather than 0.05. But it's lunchtime now and I think I've certainly exceeded my allotted time on the floor this round.

volleo6144 wrote: ↑Mon Jun 22, 2020 10:15 amYeah, I guess.cmloegcmluin wrote: ↑Mon Jun 22, 2020 4:50 amI notice the relatively large value 44.155 that the 13:77n got. This appears to be the result of 0.005(13640319 - 13631488). I should have been more explicit about what I meant by n and d: I didn't mean in the actual ratio; I meant in the 2,3-reduced ratio.volleo6144 wrote: ↑Sun Jun 21, 2020 4:55 pm I did this for the yellow tina candidates (the ones we've mostly agreed on):

...I did the yellow tinas again, with the 2,3-reduced ratios:

2t = 5831n: 38 + 9.382146 + 1.7 + 1.90 + 29.15 = 80.13215 :@2: 3t = 455n: 25 + 6.622292 + 1.3 + 1.25 + 2.27 = 36.44229 4t 7:3025n: 39 + 10.77756 + 1.1 + 1.25 + 15.09 = 67.21756 :@4: 5 25:2401n: 38 + 11.90496 + 0.7 + 0.90 + 11.88 = 63.38497 :@5: 6t 65:77n: 36 + 9.216866 + 1.3 + 0.00 + 0.06 = 46.57687 --- half of 5s --- 8t 13:77n: 31 + 7.475420 + 1.3 + 0.25 + 0.32 = 40.34542 :@8: 9t 539n: 25 + 6.805606 + 1.1 + 1.25 + 2.69 = 36.84561 14t 5s: 5 + 1.741446 + 0.5 + 0.25 + 0.02 = 7.511446The "u" term heavily penalizes commas with many primes on one side but only one or two, or zero, on the other, such as our yellow 2 [5831:5832], 4 [3024:3025], and 5 [2400:2401] tinas.

Dave Keenan wrote: ↑Mon Jun 22, 2020 8:39 pmSo "u" is the only thing differentiating 7/5 from 35 etc, but not very well. I'd like to understand why "t" isn't doing that. But maybe if you take the log of "u" it will work even better.volleo6144 wrote: ↑Mon Jun 22, 2020 10:15 am The "u" term heavily penalizes commas with many primes on one side but only one or two, or zero, on the other, such as our yellow 2 [5831:5832], 4 [3024:3025], and 5 [2400:2401] tinas.

volleo6144 wrote: ↑Tue Jun 23, 2020 12:24 am No, "t"isdoing that, but in a different way: 77:125 (for example) is (18-15)/20 = 0.15 for t and (125-77)/200 = 0.24 for u. It's just that t does this to a greater extent (because of its higher scaling of 5 per 100 instead of 5 per 1000) for the typical case of distinguishing 5:7 from 1:35, while uheavilypenalizes things with alotof primes on one side (49:9765625n for the half-tina is immediately out because its u value is over 40,000, while its t value is a more tolerable 1.8).

And then we have the issue of how to prevent Pythagorean schisminas like [-1054 665> (665edo3n, which has a respectably low error of 0.037806 tinas from the half-tina) from being all the primary commas for all the tinas.