developing a notational comma popularity metric

Dave Keenan
Posts: 1098
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

cmloegcmluin wrote:
Tue Jul 28, 2020 2:45 am
Okay, I've added the chunk counts, as well as two 2-chunk entries:
Thanks for that.
The "k" entry is text{sopfr} where k = 0.7901235.
The "j" entry is text{sopfr} where j = 1.0954774 and is an exponent.
At first, this made no sense to me. "sopfr" is sopfr(nd). Where could a "k" or a "j" go? Exponent of what? But I had only just woken up. Eventually I figured out that you probably mean:

NameDescription
ksopfr(n) + k × sopfr(d), where sopfr(n) ≥ sopfr(d)
j(sopfr(n)) j + sopfr(d)), where sopfr(n) ≥ sopfr(d)

Is that correct? If so, under my naming scheme they should be called ks and js, where the trailing s indicates that numinator and diminuator are decided by comparing soapfars (which in this case happen to be sopfrs).
Doing this task has made me realize that it does not support recognizing text{wyb} as 7 chunks. It'd consider it an 8-chunk metric.
To me, it's at most 5 chunks, as described here: viewtopic.php?p=2054#p2054
$$\text{wyb_metric}(n,d) = \sum_{p=5}^{p_{max}} \big((\operatorname{lb}{p}+w){n_p}^y + (\operatorname{lb}p+b){d_p}^y)\big), \text{where }n>=d$$
$$y=0.865618551, w=-1.472615144, b=-2.02634047, \text{ gives } SoS=0.006057649$$
The points of disagreement is this: I think that using one submetric on n and another on d feels like a single chunk of complexity; however, my code can only realize this as the first submetric adding a chunk (k = 0) to zero out its d, and the other adding a chunk (j = 0) to zero out its n. Unfortunately, a "cross-submetric parameter chunk" such as this would add a surprising amount of complexity to the code, which I'm not convinced is worth it.
All that shows is that you made a poor decision in how your code is structured.
My gut says that using different submetrics on n and d feels psychoacoustically unmotivated.
Remember that n and d are not numerator and denominator, but big side and small side, when compared either directly, or after the application of some soapfar function (where such indirection should count as a chunk). Treating n and d differently is absolutely essential if we are to account for 35/1 being less popular than 7/5. To me, even sopfr(n) + k × sopfr(d) is applying different submetrics to n and d. You may recall my earlier reaction to what, to me, continues to seem a strange definition of "submetric" on your part, and is the cause of your difficulty in implementing wyb.

Dave Keenan
Posts: 1098
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

By setting them up in my spreadsheet, I confirmed, by obtaining the same SoS with your parameter values, that you mean the following.

NameDescription
ksopfr(n) + k × sopfr(d), where n ≥ d
j(sopfr(n)) j + sopfr(d)), where n ≥ d

And so they are correctly named.

Excel's Evolutionary Solver could only find insignificant improvements on the values you found, and only in one case:

j = 1.094844739, SoS = 0.009100365, SoS(1) = 18637.5
versus
j = 1.0954774,      SoS = 0.009100971,   SoS(1) = 18653.5

For metrics ks and js I obtained:

k = 0.722500081, SoS = 0.009359076, SoS(1) = 18368.5
j = 1.095011397, SoS = 0.008653742, SoS(1) = 17863.5
versus, for metrics k and j:
k = 0.7901235, SoS = 0.009491243, SoS(1) = 18757.5
j = 1.094844739, SoS = 0.009100365, SoS(1) = 18637.5

cmloegcmluin
Posts: 799
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer
Contact:

Re: developing a notational comma popularity metric

Dave Keenan wrote:
Tue Jul 28, 2020 9:05 am
The "k" entry is text{sopfr} where k = 0.7901235.
The "j" entry is text{sopfr} where j = 1.0954774 and is an exponent.
At first, this made no sense to me. "sopfr" is sopfr(nd). Where could a "k" or a "j" go? Exponent of what? But I had only just woken up. Eventually I figured out that you probably mean:

NameDescription
ksopfr(n) + k × sopfr(d), where sopfr(n) ≥ sopfr(d)
j(sopfr(n)) j + sopfr(d)), where sopfr(n) ≥ sopfr(d)

Is that correct? If so, under my naming scheme they should be called ks and js, where the trailing s indicates that numinator and diminuator are decided by comparing soapfars (which in this case happen to be sopfrs).
No, that's not what I meant. I meant:

NameDescription
ksopfr(n) + k × sopfr(d), where n ≥ d
j(sopfr(n)) j + sopfr(d)), where n ≥ d
I was aware of your new usage for 's' in these names and understood that if that's what I meant I should have called them ks and js.

It's clear we have some different assumptions/intuitions about stuff sometimes. I thought it would be clear enough what I meant by text{sopfr} with a k, because in my world, text{sopfr} in a sense always has a potential k or j in it, and when you don't acknowledge them, they're essentially both set at 1. Sorry for the confusion!

(I wrote the above before you submitted your latest post where you figured out what I meant for yourself. I'm keeping it here anyway.)
Doing this task has made me realize that it does not support recognizing text{wyb} as 7 chunks. It'd consider it an 8-chunk metric.
To me, it's at most 5 chunks, as described here: viewtopic.php?p=2054#p2054
$$\text{wyb_metric}(n,d) = \sum_{p=5}^{p_{max}} \big((\operatorname{lb}{p}+w){n_p}^y + (\operatorname{lb}p+b){d_p}^y)\big), \text{where }n>=d$$
$$y=0.865618551, w=-1.472615144, b=-2.02634047, \text{ gives } SoS=0.006057649$$
Hm, okay, let's set aside the struggle between 7 and 8 chunks which is only in the domain of my implementation, and focus here on the difference in your and my conceptualization of this metric, and determine what constitutes the difference between your 5 chunks and my 7 chunks. My 7 chunks are:
1. the soapfar term
2. the sompfar term
3. lb
4. w
5. b
6. y
7. ...I'm not sure. I did come up with that 7 number right when I woke up I suspect I may have thrown one in for whatever the m's value was, under the assumption it was that thing you used at some point which halved all values totaled but only for the 5 term of the monzos or something. But I think I see now that may be calling it text{sompfar} just to underscore the fact that it uses a different w (called b) than the w the text{soapfar} uses.
And so if the "m" in text{sompfar} doesn't mean anything functionally different, then I do agree with your 5 chunk conceptualization. I'll go back and correct it in your table in the recent post.

Heh. This exchange brings to light another shortcoming with my code which I'll need to address in order for it to recognize this as 5 chunk. Unlike the "cross-submetric parameter chunk" problem, however, my code is set up for success against this "when all submetrics are of the same type, that should constitute only 1 chunk" problem. I'll have a fix for it soon.
The points of disagreement is this: I think that using one submetric on n and another on d feels like a single chunk of complexity; however, my code can only realize this as the first submetric adding a chunk (k = 0) to zero out its d, and the other adding a chunk (j = 0) to zero out its n. Unfortunately, a "cross-submetric parameter chunk" such as this would add a surprising amount of complexity to the code, which I'm not convinced is worth it.
All that shows is that you made a poor decision in how your code is structured.
Perhaps so. Yup. I think this pain could be summed up like this: I started building the solver before we recognized that chunks were going to be of such utmost importance. Had I built the thing chunk-first, it may have been no problem. However, I built the thing the simplest way I could manage to achieve the different parameters we were discussing, without recognizing at that time that we'd reach a place where I'd want to run a big automatic script for a given chunk count to tell us the best metric possible for that chunk count, rather than it being okay to just compute the chunk count per metric after the fact.
My gut says that using different submetrics on n and d feels psychoacoustically unmotivated.
Remember that n and d are not numerator and denominator, but big side and small side, when compared either directly, or after the application of some soapfar function (where such indirection should count as a chunk). Treating n and d differently is absolutely essential if we are to account for 35/1 being less popular than 7/5. To me, even sopfr(n) + k × sopfr(d) is applying different submetrics to n and d. You may recall my earlier reaction to what, to me, continues to seem a strange definition of "submetric" on your part, and is the cause of your difficulty in implementing wyb.
I agree that treating n and d separately is necessary. Can we also agree that, whatever we call things (submetrics or otherwise) that there is a line — however fuzzy it may be — between metrics which treat n and d differently in a psychoacoustically justifiable way, and metrics which treat n and d differently in an artificial way?

I'll also say that if text{sompfar} is simply a different w and not functionally different, then that is well within my tolerance for a psychoacoustically justifiable difference in the treatment of n and d. Had it been that weird half-5 trickery, or a completely different thing like text{gpf}, then I'd've continued to push back.

Dave Keenan
Posts: 1098
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

cmloegcmluin wrote:
Tue Jul 28, 2020 11:22 am
I was aware of your new usage for 's' in these names and understood that if that's what I meant I should have called them ks and js.
Sorry I doubted it.

BTW, this use of a trailing "s" doesn't preclude naming metrics that have a parameter "s", being the coefficient of prime-limit or gpf. That "s" can go anywhere except at the end of the name.
It's clear we have some different assumptions/intuitions about stuff sometimes. I thought it would be clear enough what I meant by text{sopfr} with a k, because in my world, text{sopfr} in a sense always has a potential k or j in it, and when you don't acknowledge them, they're essentially both set at 1. Sorry for the confusion!
No problem. Different assumptions/intuitions are also what makes two heads better than one.
...I'm not sure. I did come up with that 7 number right when I woke up I suspect I may have thrown one in for whatever the m's value was, under the assumption it was that thing you used at some point which halved all values totaled but only for the 5 term of the monzos or something. But I think I see now that may be calling it text{sompfar} just to underscore the fact that it uses a different w (called b) than the w the text{soapfar} uses.
I see I didn't explain anywhere, my reason for using "sompfar" in addition to "soapfar". Sorry. And so I don't blame you for confusing it with "mcopfar". You have it exactly right above. Sompfar and soapfar differ only in the value of their w-class parameter and hence in the way their prime factors are altered. I used "m" for "modified" as it was a synonym of "a" for "altered".
And so if the "m" in text{sompfar} doesn't mean anything functionally different,
It doesn't mean anything functionally different.
then I do agree with your 5 chunk conceptualization. I'll go back and correct it in your table in the recent post.
Cool.
I agree that treating n and d separately is necessary. Can we also agree that, whatever we call things (submetrics or otherwise) that there is a line — however fuzzy it may be — between metrics which treat n and d differently in a psychoacoustically justifiable way, and metrics which treat n and d differently in an artificial way?
Yes. Except that, as I've pointed out before, it's worrying how easy it is to argue for the psychoacoustic plausibility of something after you find out how well it lets you match the data. This is referred to in evolutionary biology as the lure of "Just-So stories".
I'll also say that if text{sompfar} is simply a different w and not functionally different, then that is well within my tolerance for a psychoacoustically justifiable difference in the treatment of n and d. Had it been that weird half-5 trickery, or a completely different thing like text{gpf}, then I'd've continued to push back.
Cool.

Dave Keenan
Posts: 1098
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

Here's another metric: "wb". It's like wyb, but with y set to 1.

w = -1.645808649, b = -2.043765116, SoS = 0.007345361, SoS(1) = 16520.5, Chunks = 4

I've slotted it into the table below.

MetricLowest SoSSoS Chunk
namefound z=-1with z=1count
sopfr0.01420608719845.01
k0.00949124318757.52
j0.00910036518637.52
wyk0.00746044317077.55
wb0.00734536116520.54
cwyk0.00730019516890.57
wyks0.00640663914125.56
hyg0.00637271317867.55
wyb0.00605764915638.55
xwyks0.0055389214309.57
cwyks0.00405952213440.58

Dave Keenan
Posts: 1098
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

I remind you that the inspiration for wyb (and hence wb) was the extreme metric (20 parameters) where I allowed the solver to adjust the weights of all the primes independently, with numerator prime weights independent of denominator prime weights, while also adjusting an exponent y, that compressed the repeat counts — one exponent for both numerator and denominator.

But previously, I could only do an approximation of that, because I was limited to using a continuous function that only approximated the rank.

I have now rerun it with the Evolutionary Solver and the true rank, and obtained a minimum SoS = 0.00338924, and SoS(1) = 13296.5, with y = 0.935877759 and the prime weights as shown on the chart below. Blue for numerator, red for denominator.

We can see that an offset log function is an excellent approximation for the optimal numerator weights. But the denominator weights include an unusually high weight for prime 19.

There is only one ratio with 19 in its denominator among the first 80 ratios. But I had temporarily increased the training set to 104 of the first 106 ratios (omitting the two ratios with primes 67 and 211). This gets down to ratios with only 11 occurrences out of the 29 403 total. But it includes four ratios with 19 as their denominator.

After minimising the SoS on these 104 ratios, I set it back to using the first 80 ratios only, and confirmed that the solver did not find a lower SoS given several minutes of runtime.
Attachments
μyδ.png

cmloegcmluin
Posts: 799
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer
Contact:

Re: developing a notational comma popularity metric

Dave Keenan wrote:
Tue Jul 28, 2020 12:16 pm
cmloegcmluin wrote:
Tue Jul 28, 2020 11:22 am
I was aware of your new usage for 's' in these names and understood that if that's what I meant I should have called them ks and js.
Sorry I doubted it.
No worries. I didn't mean — by preserving what I had written even after you figured out what I meant yourself — to come across as defensive. Certainly I have often enough failed to keep abreast of various bits of info on this thread!
BTW, this use of a trailing "s" doesn't preclude naming metrics that have a parameter "s", being the coefficient of prime-limit or gpf. That "s" can go anywhere except at the end of the name.
Agreed. How convenient that we ended up with such a flexible consonant standing for these two things
It's clear we have some different assumptions/intuitions about stuff sometimes. I thought it would be clear enough what I meant by text{sopfr} with a k, because in my world, text{sopfr} in a sense always has a potential k or j in it, and when you don't acknowledge them, they're essentially both set at 1. Sorry for the confusion!
No problem. Different assumptions/intuitions are also what makes two heads better than one.
Totally agreed that these are a feature of using natural language here, not a bug. So long as we remain patient and give each other the benefit of the doubt, which I think we have been
...I'm not sure. I did come up with that 7 number right when I woke up I suspect I may have thrown one in for whatever the m's value was, under the assumption it was that thing you used at some point which halved all values totaled but only for the 5 term of the monzos or something. But I think I see now that may be calling it text{sompfar} just to underscore the fact that it uses a different w (called b) than the w the text{soapfar} uses.
I see I didn't explain anywhere, my reason for using "sompfar" in addition to "soapfar". Sorry. And so I don't blame you for confusing it with "mcopfar". You have it exactly right above. Sompfar and soapfar differ only in the value of their w-class parameter and hence in the way their prime factors are altered. I used "m" for "modified" as it was a synonym of "a" for "altered".
Oh right! It was "mcopfar"! Of course.
I agree that treating n and d separately is necessary. Can we also agree that, whatever we call things (submetrics or otherwise) that there is a line — however fuzzy it may be — between metrics which treat n and d differently in a psychoacoustically justifiable way, and metrics which treat n and d differently in an artificial way?
Yes. Except that, as I've pointed out before, it's worrying how easy it is to argue for the psychoacoustic plausibility of something after you find out how well it lets you match the data. This is referred to in evolutionary biology as the lure of "Just-So stories".
*nods* I'll try to keep an open mind about what parameters/submetrics/etc. we experiment with. That said, I'm not sure why you were so quick to dismiss the other exponential and logarithmic operations a bit ago. Maybe if you could just give a brief explanation why you're confident those could not contend for a best metric, I'd be satisfied (and recalibrated to be less dismissive of possibilities).
Dave Keenan wrote:
Tue Jul 28, 2020 5:36 pm
Here's another metric: "wb". It's like wyb, but with y set to 1.
Cool, I like it (and confirm it).

And thanks for improving upon my finding for text{j} (I didn't see that before because you must have updated your post after I replied to it).

I agree with not including the text{ks} or text{js} metrics in that table (assuming that was intentional).
Dave Keenan wrote:
Tue Jul 28, 2020 11:52 pm
I remind you that the inspiration for wyb (and hence wb) was the extreme metric (20 parameters) where I allowed the solver to adjust the weights of all the primes independently, with numerator prime weights independent of denominator prime weights, while also adjusting an exponent y, that compressed the repeat counts — one exponent for both numerator and denominator.

But previously, I could only do an approximation of that, because I was limited to using a continuous function that only approximated the rank.

I have now rerun it with the Evolutionary Solver and the true rank, and obtained a minimum SoS = 0.00338924, and SoS(1) = 13296.5, with y = 0.935877759 and the prime weights as shown on the chart below. Blue for numerator, red for denominator.
SoS = 0.00338924, you say?? Stop the presses!

Well, I'm not sure I'll be able to replicate that one easily on my end, what with a different weight on each prime...

Perhaps, if I'm understanding this correctly, 0.00338924 may be a sort of lower bound on our potential SoS, as in we shouldn't hope to ever get any better than that using any parameters that are flat across all the primes?

Or am I misunderstanding this development completely?

Dave Keenan
Posts: 1098
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

cmloegcmluin wrote:
Wed Jul 29, 2020 4:09 am
I'll try to keep an open mind about what parameters/submetrics/etc. we experiment with. That said, I'm not sure why you were so quick to dismiss the other exponential and logarithmic operations a bit ago. Maybe if you could just give a brief explanation why you're confident those could not contend for a best metric, I'd be satisfied (and recalibrated to be less dismissive of possibilities).
I will attempt to address that in a future post.
And thanks for improving upon my finding for text{j} (I didn't see that before because you must have updated your post after I replied to it).
Yes, but before I saw your reply. And I didn't think it was important enough for a new post. It didn't really matter if you missed it.
I agree with not including the text{ks} or text{js} metrics in that table (assuming that was intentional).
The omission was intentional. The tiny improvement was not enough to justify the extra chunk.

Incidentally, cwyk versus cwyks is kind of the opposite situation. In this case the "s" version is soooo much better than the non-s that you might wonder why I continue to include cwyk in the table at all. It's because, if I didn't, I'd be repeatedly thinking: Gosh, cwyks is so good, but also so complex, I wonder if it would still be nearly as good if we dropped one of its chunks, say the "s". That's also a reason for keeping wyks in the table.
SoS = 0.00338924, you say?? Stop the presses!

Well, I'm not sure I'll be able to replicate that one easily on my end, what with a different weight on each prime...
Please don't spend any time on that. Just trust me and the Evo Solver on this one. Please do whatever you have to do to let us settle on a metric, so we can get back to choosing commas for the tina diacritics, so we can submit the SMuFL/Bravura update to Steinberg, that more importantly contains the Olympian diacritics, and get back to making introductory educational material and other resources to increase the usability of Sagittal.
Perhaps, if I'm understanding this correctly, 0.00338924 may be a sort of lower bound on our potential SoS, as in we shouldn't hope to ever get any better than that using any parameters that are flat across all the primes?
You are understanding this correctly. But as you say, it's only a "sort of" lower bound. It's the lower bound on all metrics of the form soapfar(n) + sompfar(d) where ar(r) = ry, n ≥ d

cmloegcmluin
Posts: 799
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer
Contact:

Re: developing a notational comma popularity metric

Dave Keenan wrote:
Wed Jul 29, 2020 9:05 am
cmloegcmluin wrote:
Wed Jul 29, 2020 4:09 am
... if you could just give a brief explanation why you're confident those could not contend for a best metric, I'd be satisfied (and recalibrated to be less dismissive of possibilities).
I will attempt to address that in a future post.
Looking forward.
And thanks for improving upon my finding for text{j} (I didn't see that before because you must have updated your post after I replied to it).
Yes, but before I saw your reply. And I didn't think it was important enough for a new post. It didn't really matter if you missed it.
Yep, not complaining. I think it was the right choice.
I agree with not including the text{ks} or text{js} metrics in that table (assuming that was intentional).
The omission was intentional. The tiny improvement was not enough to justify the extra chunk.

Incidentally, cwyk versus cwyks is kind of the opposite situation. In this case the "s" version is soooo much better than the non-s that you might wonder why I continue to include cwyk in the table at all. It's because, if I didn't, I'd be repeatedly thinking: Gosh, cwyks is so good, but also so complex, I wonder if it would still be nearly as good if we dropped one of its chunks, say the "s". That's also a reason for keeping wyks in the table.
I 100% know what you mean! Sometimes this thread almost feels like poetry, or at least architecture.
SoS = 0.00338924, you say?? Stop the presses!

Well, I'm not sure I'll be able to replicate that one easily on my end, what with a different weight on each prime...
Please don't spend any time on that. Just trust me and the Evo Solver on this one. Please do whatever you have to do to let us settle on a metric, so we can get back to choosing commas for the tina diacritics, so we can submit the SMuFL/Bravura update to Steinberg, that more importantly contains the Olympian diacritics, and get back to making introductory educational material and other resources to increase the usability of Sagittal.
Yes!

Back from vacation, but got a bunch of other things I have to take care of rather soon, so I haven't been able to plunge back in to Sagittal as fully as I'd hoped.
Perhaps, if I'm understanding this correctly, 0.00338924 may be a sort of lower bound on our potential SoS, as in we shouldn't hope to ever get any better than that using any parameters that are flat across all the primes?
You are understanding this correctly. But as you say, it's only a "sort of" lower bound. It's the lower bound on all metrics of the form soapfar(n) + sompfar(d) where ar(r) = ry, n ≥ d
Yes, emphasis on the "sort of"!

Dave Keenan
Posts: 1098
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

Here are the 6 related functions of x that you brought up recently, where c is a constant. Or at least you brought up 4 of them, and in response I mentioned the other 2. The red lines connect pairs of functions that swap the constant with the variable. The green lines (imagined as continuous and crossing in the centre) connect pairs of functions that are inverses of each other.

    cx —— xc
\  /
c1/x ——   —— 1/(logc(x))
\  /  \  /
c√x   logc(x)


Here's another way of writing them, that makes the red relationships more obvious

    cx —— xc
\  /
c1/x ——   —— ln(c)/ln(x)
\  /  \  /
x1/c   ln(x)/ln(c)


And ln() could be replaced by lb(). More to come, when I find the time.