developing a notational comma popularity metric

Post by **Dave Keenan** » Sun Oct 25, 2020 4:50 pm

cmloegcmluin wrote: ↑Sun Oct 25, 2020 3:01 pm Ah, if we could only pair program sometimes!

Oh man! That would be fun.

Post by **cmloegcmluin** » Mon Oct 26, 2020 3:58 am

Dave Keenan wrote: ↑Sun Oct 25, 2020 12:49 pm My guess is that there won't be any more, because if there were, they would have to include 5⁷7² = 3 828 125, but its N2D3P9 is 5815.29.

Indeed there were no further results (besides, of course, 9765625 itself).

Post by **Dave Keenan** » Mon Oct 26, 2020 7:24 am

cmloegcmluin wrote: ↑Mon Oct 26, 2020 3:58 am Indeed there were no further results (besides, of course, 9765625 itself).

Well done. How many numerators were found?

Post by **cmloegcmluin** » Mon Oct 26, 2020 7:35 am

1014 of them. You can check out the full results here: https://github.com/Sagittal/sagittal-ca ... rators.txt

Post by **Dave Keenan** » Mon Oct 26, 2020 10:26 am

Awesome. That's such a manageable number that I suggest you forget about my complicated denominator-generating procedure that this result was supposed to be fodder for, and just try every numerator as a potential denominator to generate ratios, calculate their N2D3P9, throw away those greater than 5298.19065, then sort them on N2D3P9. There are only 1014 × 1013 / 2 = 513 591 ratios to try.

For this purpose, it would be more useful to have the copfr of each numerator rather than its n2 or n2p. This is readily obtained as copfr = round(lb(numerator/n2)). And it would be more useful to have the numerators sorted by numerator rather than by n2 or n2p. I suggest preprocessing the existing file to generate a file with numerator, gpf and copfr, in numerator order, before feeding it to a ratio generator/tester/sorter.

Post by **Dave Keenan** » Mon Oct 26, 2020 11:13 am

I've munged the file as I suggested above, by using Notepad++ and Excel. I've left it with one line per numerator, and tabs where some carriage-returns were previously. BTW, I see 1015 numerators, not 1014.

Post by **cmloegcmluin** » Mon Oct 26, 2020 11:24 am

Thanks for this. I will plug in the benefits of these sorted lists soon, following your suggestions.

Post by **Dave Keenan** » Mon Oct 26, 2020 11:39 am

cmloegcmluin wrote: ↑Mon Oct 26, 2020 11:24 am Thanks for this. I will plug in the benefits of these sorted lists soon, following your suggestions.

You say "lists" plural, but you should only use the list that's sorted by numerator (and has copfr). That way it's just a pair of nested loops, the outer one stepping along the list and using each element as a numerator, the inner one stepping along the same list and using each element as a denominator but only up until you reach the same index as the outer loop. Since d < n.

Post by **cmloegcmluin** » Mon Oct 26, 2020 12:59 pm

Thanks for correcting me. TBH, I hadn't taken the time to carefully understand your suggestion yet. I should have just admitted that. I'm focusing on the refactor I'm in the middle of, so that I can finish in time for pairing on the more important comma usefulness work with you tomorrow.

Post by **Dave Keenan** » Wed Oct 28, 2020 7:45 pm

For anyone else following along: Douglas and I did a successful pair programming session via zoom, and following that we have been testing 8 different candidate parameterised-functions for a comma (notational) usefulness metric. These consist of all the possible sums of compressed N2D3P9 plus expanded absolute-apotome-slope (AAS) plus expanded absolute-3-exponent (ATE), where we choose between log and root for the compression and between exponential and power for the expansions.

Douglas is testing these candidate functions using his javascript code and I'm doing it in an Excel spreadsheet (after Douglas kindly shared the necessary input data with me). That way we can check each other's results.

The input data consists of the existing default comma for every valid single-shaft sagittal symbol (including accents) up to the half-apotome, and for each such symbol, all the reasonable candidate commas that fall within its capture zone, at the lowest JI precision level at which that symbol occurs. "Reasonable" here means having N2D3P9 ≤ 307, AAS ≤ 14, ATE ≤ 15.

So far, the best such usefulness-ranking function I have found (based on maximising the number of existing commas that it ranks as the most useful in their zone), is:

usefulness_rank = lb(N2D3P9) + 1/12 × AAS^1.37 + 2^(ATE-10.5)

But what was at first rather depressing, was that even this best (so far) function only validates 91 of the 123, or 74%, of the existing symbol commas. Did George and I really do such a bad job?

There are certainly 2 commas (for some rarely-used symbols) where we did a bad job. That is, we assigned commas that were far less useful than the most useful comma in the capture zone. These were identified earlier, based on their N2D3P9 values alone, as follows:

I believe for

we should replace 19/4375s [-8 10 -4 -1 0 0 0 1⟩ with 1/575s [6 2 -2 0 0 0 0 0 -1⟩
and for

we should replace 14641k [-17 2 0 0 4⟩ with 143/5k [-8 2 -1 0 1 1⟩

Our recent replacement of 47M with 85/11M for

$:/|\:$ was validated. But it might need to be reconsidered in the light of the information below, because we already had a symbol for an 11:85 comma.

for 85/11C.

Because I then found the following very telling cases, where George and I assigned commas that were far less "useful" than the most "useful" in their lowest-precision capture zone, by any conceivable usefulness ranking function based on N2D3P9, AAS and ATE alone, and yet I believe our assignments are entirely justified.

We rightly assigned:

to 19s [-9 3 0 0 0 0 0 1⟩ instead of the more "useful" 5s [-15 8 1⟩.

to 1/17k [-7 7 0 0 0 0 -1⟩ instead of the more "useful" 25/7k [-5 2 2 -1⟩

to 1/19C [-10 9 0 0 0 0 0 -1⟩ instead of the more "useful" 1/25C [11 -4 -2 0 0 0 0 0⟩

to 11/19M [4 -2 0 0 1 0 0 -1⟩ instead of the more "useful" 1/7M [-13 10 0 -1⟩

and there are several others like these.

I see at least two reasons here, not to use the most "useful" (according to the above function, or others like it).
1. The most "useful" is outside the symbol's capture zone at a higher precision level.
2. The most "useful" is a comma for a 2,3-equivalence class that already has a symbol (based a more useful comma for the same equivalence class).

So optimising the above usefulness function(s) based on sum-of-squared-errors in usefulness, instead of a simple count of commas matched, will not be useful. Assignments like those above will skew the result in ways that are not meaningful.

I want to remind us of what we're doing here. The most pressing need at present, is a metric for choosing commas (and/or metacommas) for tina accents. I don't think either of the above numbered reasons are likely to occur in the case of candidate commas for tina accents. So I say the above usefulness metric is Good Enough™ and we should just run with it.

The Sagittal forum

developing a notational comma popularity metric

Re: developing a notational comma popularity metric

Re: developing a notational comma popularity metric

Re: developing a notational comma popularity metric

Re: developing a notational comma popularity metric

Re: developing a notational comma popularity metric

Re: developing a notational comma popularity metric

Re: developing a notational comma popularity metric

Re: developing a notational comma popularity metric

Re: developing a notational comma popularity metric

Re: developing a notational comma popularity metric