### Re: developing a notational comma popularity metric

Posted:

**Thu Jun 25, 2020 2:13 am**I'm glad you like it. And nice find! Now if only we could find an established name for a function that roughens a number x to n-rough, we could have a more proper way of writing SoPF>3. I looked around for a bit but didn't find aything. Perhaps it might simply be roughDave Keenan wrote: ↑Wed Jun 24, 2020 6:37 pmI love SoUP or even SoUPF, versus SoPF, and so I am sad to report that these functions already have standard names

SoPF = sopfr (sum of prime factors with repetition)

SoUP = sopf (sum of prime factors )

https://mathworld.wolfram.com/SumofPrimeFactors.html

_{n}x, and similarly we'd have smooth

_{n}x. Or perhaps rgh

_{n}x and smth

_{n}x would be preferable. Then we'd have sopfr(rgh

_{n}(x)) instead of SoPF>3.

I've been thinking about that the last couple days, especially in light of my re-emphasizing of part of your original ask, to "[filter] out the historical noise". Maybe the way to approach the problem is: the frequencies are a useful tool to help us aim, but they're not the target. The rank is the target. It seems like you were already on that page...I want to get into testing some ideas for this, in a spreadsheet, as is my wont. Given that we don't care about matching the frequencies, but only matching the frequencyranking(of the first 40 or so 5-rough ratios), what are you actually fitting your candidate functionsto?

I'm not sure if we need to define an exact success condition at this point. But I suppose if we managed to find a metric which matched the rankings for the first 40 or so commas, we'd've done fairly well for ourselves.

If our target is matching just the rankings, shouldn't we try to hit them exactly? Maybe I'm missing something.How might one directly measure how well one ranking matches another, i.e. the ranking produced by the candidate function and the ranking from the Scala archive statistics?

I was thinking our candidate function would map a comma to some value, just as the Scala stats map a comma to a frequency value, and if they were both sorted, then we did it.

It sounds like maybe you were thinking our candidate function would map a comma to a value that was meant to look just like the value of a rank, e.g. it might try to map 11:1 to something really close to 6 because 11:1 is the 6th most popular comma. That could work too, but it seems like an extra step for our candidate function to do, and also an extra question for us to answer (this question of yours immediately above), and also maybe makes the problem unnecessarily more difficult. I just don't think, given how sparse and noisy the data is, we should shoot for anything higher fidelity than the sorting coming out the same.