## developing a notational comma popularity metric

Dave Keenan
Site Admin
Posts: 1084
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

### Re: developing a notational comma popularity metric

cmloegcmluin wrote:
Wed Jul 01, 2020 3:41 pm
I guess I'm still a bit confused about this bit. If there's a "revelation of k=0, c≠0" is it that we actually throw away the entire denominator?
No the revelation is that it's only the numerator that needs a soapfar. The denominator does better with only a copfr.

But prior to my latest, the copfr was being applied to the numerator as well as the denominator. This seems unnecessary as "c" just competes with "w" there, but with the complication that the soapfar compresses its repeat-counts to the power y while the copfr does not. I wanted to free the numerator from interference by the copfr.
Or maybe in order to avoid potential confusion with the numerator and denominator of the input ratio we should call these the "greaterator" and "lesserator", in which case I mean we throw away the lesserator?
Ha. Great. Or the biggerator and the smallerator. Or the sopferator and the copferator. Or, so I can keep using n and d, the numinator and diminuator (from numinous and diminutive).

No we definitely don't want to throw away the diminuator. We want to give it its own soapfar, tailored just for it. But it turns out, when you ask it what it wants, by letting it choose an independent weight for every prime, it doesn't actually want a full blown soapfar. It's quite happy, thank you very much, with an ever-so-slightly modified copfr, an mcopfr (where 5's only count half).

Actually, the diminuator wouldn't mind if you wanted to raise its monzo terms to a power before summing them. So an mcopfar. But the diminuator wants a power greater than one, where the numinator wants a power less than 1. I changed my mind about using Y and y for the n and d powers. I'll keep y for the numinator and use v for the diminuator.

Dave Keenan
Site Admin
Posts: 1084
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

### Re: developing a notational comma popularity metric

Now, 3 parameters.

metric(n, d) = soolpfcr(n, w, y) + mcopfer(d, v)

As before, "soolpfcr" stands for "sum of offset log of prime-factors with compressed repeats".
The log is base 2. The offset is w < 0 and the compression consists in raising the repeat-counts to the power y < 1.

"mcopfer" stands for "modified count of prime factors with expanded repeats".
The expansion consists in raising the repeat-counts to the power v > 1. As before, the modification consists of halving the (now expanded) repeat-count for prime 5.

For the ratio a/b,
if sopfr(a) ≥ sopfr(b) then
n = a; d = b
else
n = b; d = a

Take the monzo for the numinator to be
n = [, n5 n7 n11 n13 n17 ...
and the monzo for the deminuator to be
d = [, d5 d7 d11 d13 d17 ...

soolpfcr(n, w, y) = sum over primes p from 5 to max_prime of ( [log2(p) + w] × npy )

mcopfer(d, v) = sum over primes p from 5 to max_prime of ( if p=5 then 0.5 × dpv else dpv )

I get optimal values of w = -1.431, y = 0.851, v = 1.332, giving SoS = 0.00614.

You may be able to find parameter values that give lower values of SoS for these two metrics.

Including the "s" parameter, for prime-limit or gpf, appears to have very little effect on the minimum SoS.

Dave Keenan
Site Admin
Posts: 1084
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

### Re: developing a notational comma popularity metric

If a fourth parameter is to be added, for a small improvement, there are 3 different ways to do it, which turn out to be equivalent to each other.
1. You can restore the parameter "k", so we have metric = soolpfcr(n) + k × mcopfar(d).
or
2. You can restore the parameter "α", the log base used in soolpfcr(n) which is otherwise fixed at 2.
or
3. You can introduce a parameter "h" which is a fractional count to be used in mcopfar(d) for all the higher primes, in the way that 0.5 was used for prime 5. And now we make the fractional count for prime 5 be h/2.

In these 3 cases I get optimum values (minimising my proxy SoS) of:
1. k = 0.953.
or
2. α = 1.936
or
3. h = 0.953

Regarding case 2, I remind you of the "change of base" formula, logα(p) = log2(p) / log2(α)

Guess what log2(α) is for the optimum α? You got it. log2(1.936) ≈ 0.953

In case 2, you also find that the new optimum value of w is the old value divided by 0.953.

I hope you can see why these 3 cases are all equivalent.

In case 3, it turns out that I can get a tiny reduction in (proxy) SoS by leaving the fractional count for prime 5 as 0.5, and using h only for the higher primes. I can get the same effect in the other two cases by setting the fractional count for prime 5 to 0.528, and leaving the higher primes at a count of 1. But that would make the fractional count for prime 5 a fifth parameter.

I get the following optimum values:

w = -1.44, y = 0.860, h = 0.947, v = 1.331, SoS = 0.00660

You may notice that the SoS value I give here is not lower than for my previous (3 parameters) metric. It had a lower value for my proxy SoS, but the SoS I've been giving is the true SoS (of errors in reciprocal ranks) which my spreadsheet can't exactly minimise. Your code may be able to find parameter values that give a lower values of SoS.

I'd be very interested to know if you can reproduce, and improve on, the results for the 2, 3 and 4 parameter metrics above.

To be clear, the 4 parameter metric described above is:

metric(n, d) = soolpfcr(n, w, y) + mcopfer(d, h, v)

soolpfcr(n, w, y) = sum over primes p from 5 to max_prime of ( [log2(p) + w] × npy )

mcopfer(d, h, v) = sum over primes p from 5 to max_prime of ( if p=5 then 0.5 × dpv else h × dpv )

Dave Keenan
Site Admin
Posts: 1084
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

### Re: developing a notational comma popularity metric

The only other thing I have left to investigate, is whether I can get slightly lower SoS by using a different "numinosity" function. i.e. a different function to decide which is the numinator and which is the diminuator. I'm currently using sopfr.

I normally wouldn't be able to work on this stuff today, but my electric vehicle buddy, Warrick, is in hospital recovering from an operation to remove a kidney stone.

Dave Keenan
Site Admin
Posts: 1084
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

### Re: developing a notational comma popularity metric

It's possible that including "s" times the prime-limit may give a better 4 parameter metric than including "h" for the higher-than-5 primes in the diminuator. i.e.

metric(n, d) = soolpfcr(n, w, y) + mcopfer(d, v) + s × gpf(n × d)

Dave Keenan
Site Admin
Posts: 1084
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

### Re: developing a notational comma popularity metric

Oh wait! It has to be

numinator
---------------
demonator

cmloegcmluin
Site Admin
Posts: 783
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer
Contact:

### Re: developing a notational comma popularity metric

Dave Keenan wrote:
Wed Jul 01, 2020 5:46 pm
"mcopfr" stands for "modified count of prime factors with repeats".
The modification to the count consists of counting each factor of 5 as only a half count. All higher prime factors count as 1 as usual.
Interesting technique. Not something I'd considered yet. I will try to reproduce the 4-parameter metric of yours which uses it soon.
Dave Keenan wrote:
Wed Jul 01, 2020 7:46 pm
Including the "s" parameter, for prime-limit or gpf, appears to have very little effect on the minimum SoS.
Yeah. That surprises me a bit. Not from a mathematical vantage, but from the psychological one. I figured composers have a bit of a mental barrier against ratcheting up to the next prime limit and that could be reflected in the data.

Or maybe it would be if our data weighted the scales in the archive by their popularity.

------

I am eliminating from my investigations the possibility of compressing the repetitions with a base rather than a power after a refactor and increased test coverage of my code pointed out that logarithms behave undesirably around 0 and 1 which is where the absolute values of a great many of our monzo terms are.

------
Dave Keenan wrote:
Wed Jul 01, 2020 6:40 pm
so I can keep using n and d, the numinator and diminuator (from numinous and diminutive).
I'm into those names. Thanks! I used them in my code.
No we definitely don't want to throw away the diminuator. We want to give it its own soapfar, tailored just for it. But it turns out, when you ask it what it wants, by letting it choose an independent weight for every prime, it doesn't actually want a full blown soapfar. It's quite happy, thank you very much, with an ever-so-slightly modified copfr, an mcopfr (where 5's only count half).
Okay, yeah, I get it now. Thanks for explaining.

Man, we give something a name and straight away it has its own feelings and desires...
Dave Keenan wrote:
Thu Jul 02, 2020 12:38 pm
"numinosity" function. i.e. a different function to decide which is the numinator and which is the diminuator
"numinosity"... you're really running with this. I like it.

Sending good vibes to your pal Warrick.
Dave Keenan wrote:
Thu Jul 02, 2020 1:22 pm
numinator
---------------
demonator
Amazing. I can't even think why I knew "numen" meant something heavenly, but for some reason I did.

------
Dave Keenan wrote:
Thu Jul 02, 2020 12:07 pm
I hope you can see why these 3 cases are all equivalent.
...sigh... unfortunately I can't. I'm with you on log2(1.936) ≈ 0.953. But otherwise with all the modifications I can hardly believe it.

I still need more time to focus on my own investigations I think.

Dave Keenan
Site Admin
Posts: 1084
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

### Re: developing a notational comma popularity metric

cmloegcmluin wrote:
Thu Jul 02, 2020 2:43 pm
Interesting technique. Not something I'd considered yet. I will try to reproduce the 4-parameter metric of yours which uses it soon.
Don't bother trying to reproduce mine. My investigation into different numinosity functions looks like blowing it away. It looks like the best numinosity function is the identity function! i.e. Just call the greater (in the ordinary everyday sense) side of the 2,3-reduced ratio the numinator. Then it looks like the demonator is no longer happy with a modified copfar, but instead wants a soolpfcr like the numinator, but with a different offset to log2(p) and a different power of the repeat-count.
I still need more time to focus on my own investigations I think.
Good idea. Feel free to ignore my ramblings.

Dave Keenan
Site Admin
Posts: 1084
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

### Re: developing a notational comma popularity metric

A random observation: The ratio 11/7, by any reasonable metric, is way more popular than it has any right to be. It has rank 12, between 11/5 and 17/1. It ought to be up near 13/5, with rank 15 to 17.

Dave Keenan
Site Admin
Posts: 1084
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

### Re: developing a notational comma popularity metric

Here's my latest candidate 4 parameter metric, for which the numinosity function is the identity function. i.e. the numinator is simply the greater of the two sides of the 2.3-reduced ratio. The main difference from the previous metrics is that the demonator function is no longer a modified count, but rather a sum of primes (linear, not logarithmic like the numinator), but you subtract 2 from each prime before summing them, and the repeat-count is compressed.

The other difference is that the parameter "k" has returned.

metric(n, d) = soolpfcr(n, w, y) + k × soopfcr(d, v)

soolpfcr stands for sum of offset log of prime-factors with compressed repeat-count.
soolpfcr(n, w, y) = sum over primes p from 5 to max_prime of ( [log2(p) + w] × npy )

soopfcr stands for sum of offset prime-factors with compressed repeat-count.
soopfcr(d, v) = sum over primes p from 5 to max_prime of ( [p - 2] × dpv)

I get the following optimum values for the parameters:
w = -1.465, y = 0.869, k = 0.168, v = 0.739, SoS = 0.00648

For the record, I have further improved the continuous function I apply to the metric to obtain a proxy for the rank, which would otherwise have to be obtained by sorting (which is discontinuous). It is now:

est_rank = b ^ (i×(metric-m) + j×(metric-m) + )

The values associated with the above optimum are:
b = 3.471915951, i = 0.662705646, j = 0.230907085, m = 2.229990235, = 1.719120398

This function was found by setting up a piecewise-linear function with 81 control points and having the Excel Solver optimise that along with the metric, then plotting the resulting function in log-linear form and figuring out a smooth function that would approximate it well.

I did a similar thing to obtain the best altered-prime-factor functions for the numinator and demonator. I allowed the Solver to find optimum weights for each prime separately, then plotted these against the primes and figured out a simple function to approximate the weights, namely log2(p) + w for the numinator, and k×(p-2) for the demonator.

I think maybe we should make w positive and describe the numinator weighting function as log2(p) - w.