developing a notational comma popularity metric

User avatar
Dave Keenan
Site Admin
Posts: 2180
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

Post by Dave Keenan »

cmloegcmluin wrote: Sun Oct 25, 2020 3:01 pm Ah, if we could only pair program sometimes!
Oh man! That would be fun. : D
User avatar
cmloegcmluin
Site Admin
Posts: 1700
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer (he/him/his)
Contact:

Re: developing a notational comma popularity metric

Post by cmloegcmluin »

Dave Keenan wrote: Sun Oct 25, 2020 12:49 pm My guess is that there won't be any more, because if there were, they would have to include 5772 = 3 828 125, but its N2D3P9 is 5815.29.
Indeed there were no further results (besides, of course, 9765625 itself).
User avatar
Dave Keenan
Site Admin
Posts: 2180
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

Post by Dave Keenan »

cmloegcmluin wrote: Mon Oct 26, 2020 3:58 am Indeed there were no further results (besides, of course, 9765625 itself).
Well done. How many numerators were found?
User avatar
cmloegcmluin
Site Admin
Posts: 1700
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer (he/him/his)
Contact:

Re: developing a notational comma popularity metric

Post by cmloegcmluin »

1014 of them. You can check out the full results here: https://github.com/Sagittal/sagittal-ca ... rators.txt
User avatar
Dave Keenan
Site Admin
Posts: 2180
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

Post by Dave Keenan »

Awesome. That's such a manageable number that I suggest you forget about my complicated denominator-generating procedure that this result was supposed to be fodder for, and just try every numerator as a potential denominator to generate ratios, calculate their N2D3P9, throw away those greater than 5298.19065, then sort them on N2D3P9. There are only 1014 × 1013 / 2 = 513 591 ratios to try.

For this purpose, it would be more useful to have the copfr of each numerator rather than its n2 or n2p. This is readily obtained as copfr = round(lb(numerator/n2)). And it would be more useful to have the numerators sorted by numerator rather than by n2 or n2p. I suggest preprocessing the existing file to generate a file with numerator, gpf and copfr, in numerator order, before feeding it to a ratio generator/tester/sorter.
User avatar
Dave Keenan
Site Admin
Posts: 2180
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

Post by Dave Keenan »

I've munged the file as I suggested above, by using Notepad++ and Excel. I've left it with one line per numerator, and tabs where some carriage-returns were previously. BTW, I see 1015 numerators, not 1014.
Attachments
Numerators with N2D3P9 to 5298.txt
(37.85 KiB) Downloaded 242 times
User avatar
cmloegcmluin
Site Admin
Posts: 1700
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer (he/him/his)
Contact:

Re: developing a notational comma popularity metric

Post by cmloegcmluin »

Thanks for this. I will plug in the benefits of these sorted lists soon, following your suggestions.
User avatar
Dave Keenan
Site Admin
Posts: 2180
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

Post by Dave Keenan »

cmloegcmluin wrote: Mon Oct 26, 2020 11:24 am Thanks for this. I will plug in the benefits of these sorted lists soon, following your suggestions.
You say "lists" plural, but you should only use the list that's sorted by numerator (and has copfr). That way it's just a pair of nested loops, the outer one stepping along the list and using each element as a numerator, the inner one stepping along the same list and using each element as a denominator but only up until you reach the same index as the outer loop. Since d < n.
User avatar
cmloegcmluin
Site Admin
Posts: 1700
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer (he/him/his)
Contact:

Re: developing a notational comma popularity metric

Post by cmloegcmluin »

Thanks for correcting me. TBH, I hadn't taken the time to carefully understand your suggestion yet. I should have just admitted that. I'm focusing on the refactor I'm in the middle of, so that I can finish in time for pairing on the more important comma usefulness work with you tomorrow.
User avatar
Dave Keenan
Site Admin
Posts: 2180
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

Post by Dave Keenan »

For anyone else following along: Douglas and I did a successful pair programming session via zoom, and following that we have been testing 8 different candidate parameterised-functions for a comma (notational) usefulness metric. These consist of all the possible sums of compressed N2D3P9 plus expanded absolute-apotome-slope (AAS) plus expanded absolute-3-exponent (ATE), where we choose between log and root for the compression and between exponential and power for the expansions.

Douglas is testing these candidate functions using his javascript code and I'm doing it in an Excel spreadsheet (after Douglas kindly shared the necessary input data with me). That way we can check each other's results.

The input data consists of the existing default comma for every valid single-shaft sagittal symbol (including accents) up to the half-apotome, and for each such symbol, all the reasonable candidate commas that fall within its capture zone, at the lowest JI precision level at which that symbol occurs. "Reasonable" here means having N2D3P9 ≤ 307, AAS ≤ 14, ATE ≤ 15.

So far, the best such usefulness-ranking function I have found (based on maximising the number of existing commas that it ranks as the most useful in their zone), is:

usefulness_rank = lb(N2D3P9) + 1/12 × AAS1.37 + 2^(ATE-10.5)

But what was at first rather depressing, was that even this best (so far) function only validates 91 of the 123, or 74%, of the existing symbol commas. Did George and I really do such a bad job?

There are certainly 2 commas (for some rarely-used symbols) where we did a bad job. That is, we assigned commas that were far less useful than the most useful comma in the capture zone. These were identified earlier, based on their N2D3P9 values alone, as follows:

I believe for
:,::)|: we should replace 19/4375s [-8 10 -4 -1 0 0 0 1⟩ with 1/575s [6 2 -2 0 0 0 0 0 -1⟩
and for
:,::)|(: we should replace 14641k [-17 2 0 0 4⟩ with 143/5k [-8 2 -1 0 1 1⟩

Our recent replacement of 47M with 85/11M for :`::/|\: was validated. But it might need to be reconsidered in the light of the information below, because we already had a symbol for an 11:85 comma. :,::)|): for 85/11C.

Because I then found the following very telling cases, where George and I assigned commas that were far less "useful" than the most "useful" in their lowest-precision capture zone, by any conceivable usefulness ranking function based on N2D3P9, AAS and ATE alone, and yet I believe our assignments are entirely justified.

We rightly assigned:
:)|: to 19s [-9 3 0 0 0 0 0 1⟩ instead of the more "useful" 5s [-15 8 1⟩.
:~|: to 1/17k [-7 7 0 0 0 0 -1⟩ instead of the more "useful" 25/7k [-5 2 2 -1⟩
:)|~: to 1/19C [-10 9 0 0 0 0 0 -1⟩ instead of the more "useful" 1/25C [11 -4 -2 0 0 0 0 0⟩
:(|~: to 11/19M [4 -2 0 0 1 0 0 -1⟩ instead of the more "useful" 1/7M [-13 10 0 -1⟩

and there are several others like these.

I see at least two reasons here, not to use the most "useful" (according to the above function, or others like it).
1. The most "useful" is outside the symbol's capture zone at a higher precision level.
2. The most "useful" is a comma for a 2,3-equivalence class that already has a symbol (based a more useful comma for the same equivalence class).

So optimising the above usefulness function(s) based on sum-of-squared-errors in usefulness, instead of a simple count of commas matched, will not be useful. Assignments like those above will skew the result in ways that are not meaningful.

I want to remind us of what we're doing here. The most pressing need at present, is a metric for choosing commas (and/or metacommas) for tina accents. I don't think either of the above numbered reasons are likely to occur in the case of candidate commas for tina accents. So I say the above usefulness metric is Good Enough™ and we should just run with it.
Post Reply