Oh man! That would be fun.
developing a notational comma popularity metric
- Dave Keenan
- Site Admin
- Posts: 2180
- Joined: Tue Sep 01, 2015 2:59 pm
- Location: Brisbane, Queensland, Australia
- Contact:
Re: developing a notational comma popularity metric
- cmloegcmluin
- Site Admin
- Posts: 1704
- Joined: Tue Feb 11, 2020 3:10 pm
- Location: San Francisco, California, USA
- Real Name: Douglas Blumeyer (he/him/his)
- Contact:
Re: developing a notational comma popularity metric
Indeed there were no further results (besides, of course, 9765625 itself).Dave Keenan wrote: ↑Sun Oct 25, 2020 12:49 pm My guess is that there won't be any more, because if there were, they would have to include 5772 = 3 828 125, but its N2D3P9 is 5815.29.
- Dave Keenan
- Site Admin
- Posts: 2180
- Joined: Tue Sep 01, 2015 2:59 pm
- Location: Brisbane, Queensland, Australia
- Contact:
Re: developing a notational comma popularity metric
Well done. How many numerators were found?cmloegcmluin wrote: ↑Mon Oct 26, 2020 3:58 am Indeed there were no further results (besides, of course, 9765625 itself).
- cmloegcmluin
- Site Admin
- Posts: 1704
- Joined: Tue Feb 11, 2020 3:10 pm
- Location: San Francisco, California, USA
- Real Name: Douglas Blumeyer (he/him/his)
- Contact:
Re: developing a notational comma popularity metric
1014 of them. You can check out the full results here: https://github.com/Sagittal/sagittal-ca ... rators.txt
- Dave Keenan
- Site Admin
- Posts: 2180
- Joined: Tue Sep 01, 2015 2:59 pm
- Location: Brisbane, Queensland, Australia
- Contact:
Re: developing a notational comma popularity metric
Awesome. That's such a manageable number that I suggest you forget about my complicated denominator-generating procedure that this result was supposed to be fodder for, and just try every numerator as a potential denominator to generate ratios, calculate their N2D3P9, throw away those greater than 5298.19065, then sort them on N2D3P9. There are only 1014 × 1013 / 2 = 513 591 ratios to try.
For this purpose, it would be more useful to have the copfr of each numerator rather than its n2 or n2p. This is readily obtained as copfr = round(lb(numerator/n2)). And it would be more useful to have the numerators sorted by numerator rather than by n2 or n2p. I suggest preprocessing the existing file to generate a file with numerator, gpf and copfr, in numerator order, before feeding it to a ratio generator/tester/sorter.
For this purpose, it would be more useful to have the copfr of each numerator rather than its n2 or n2p. This is readily obtained as copfr = round(lb(numerator/n2)). And it would be more useful to have the numerators sorted by numerator rather than by n2 or n2p. I suggest preprocessing the existing file to generate a file with numerator, gpf and copfr, in numerator order, before feeding it to a ratio generator/tester/sorter.
- Dave Keenan
- Site Admin
- Posts: 2180
- Joined: Tue Sep 01, 2015 2:59 pm
- Location: Brisbane, Queensland, Australia
- Contact:
Re: developing a notational comma popularity metric
I've munged the file as I suggested above, by using Notepad++ and Excel. I've left it with one line per numerator, and tabs where some carriage-returns were previously. BTW, I see 1015 numerators, not 1014.
- Attachments
-
- Numerators with N2D3P9 to 5298.txt
- (37.85 KiB) Downloaded 270 times
- cmloegcmluin
- Site Admin
- Posts: 1704
- Joined: Tue Feb 11, 2020 3:10 pm
- Location: San Francisco, California, USA
- Real Name: Douglas Blumeyer (he/him/his)
- Contact:
Re: developing a notational comma popularity metric
Thanks for this. I will plug in the benefits of these sorted lists soon, following your suggestions.
- Dave Keenan
- Site Admin
- Posts: 2180
- Joined: Tue Sep 01, 2015 2:59 pm
- Location: Brisbane, Queensland, Australia
- Contact:
Re: developing a notational comma popularity metric
You say "lists" plural, but you should only use the list that's sorted by numerator (and has copfr). That way it's just a pair of nested loops, the outer one stepping along the list and using each element as a numerator, the inner one stepping along the same list and using each element as a denominator but only up until you reach the same index as the outer loop. Since d < n.cmloegcmluin wrote: ↑Mon Oct 26, 2020 11:24 am Thanks for this. I will plug in the benefits of these sorted lists soon, following your suggestions.
- cmloegcmluin
- Site Admin
- Posts: 1704
- Joined: Tue Feb 11, 2020 3:10 pm
- Location: San Francisco, California, USA
- Real Name: Douglas Blumeyer (he/him/his)
- Contact:
Re: developing a notational comma popularity metric
Thanks for correcting me. TBH, I hadn't taken the time to carefully understand your suggestion yet. I should have just admitted that. I'm focusing on the refactor I'm in the middle of, so that I can finish in time for pairing on the more important comma usefulness work with you tomorrow.
- Dave Keenan
- Site Admin
- Posts: 2180
- Joined: Tue Sep 01, 2015 2:59 pm
- Location: Brisbane, Queensland, Australia
- Contact:
Re: developing a notational comma popularity metric
For anyone else following along: Douglas and I did a successful pair programming session via zoom, and following that we have been testing 8 different candidate parameterised-functions for a comma (notational) usefulness metric. These consist of all the possible sums of compressed N2D3P9 plus expanded absolute-apotome-slope (AAS) plus expanded absolute-3-exponent (ATE), where we choose between log and root for the compression and between exponential and power for the expansions.
Douglas is testing these candidate functions using his javascript code and I'm doing it in an Excel spreadsheet (after Douglas kindly shared the necessary input data with me). That way we can check each other's results.
The input data consists of the existing default comma for every valid single-shaft sagittal symbol (including accents) up to the half-apotome, and for each such symbol, all the reasonable candidate commas that fall within its capture zone, at the lowest JI precision level at which that symbol occurs. "Reasonable" here means having N2D3P9 ≤ 307, AAS ≤ 14, ATE ≤ 15.
So far, the best such usefulness-ranking function I have found (based on maximising the number of existing commas that it ranks as the most useful in their zone), is:
usefulness_rank = lb(N2D3P9) + 1/12 × AAS1.37 + 2^(ATE-10.5)
But what was at first rather depressing, was that even this best (so far) function only validates 91 of the 123, or 74%, of the existing symbol commas. Did George and I really do such a bad job?
There are certainly 2 commas (for some rarely-used symbols) where we did a bad job. That is, we assigned commas that were far less useful than the most useful comma in the capture zone. These were identified earlier, based on their N2D3P9 values alone, as follows:
I believe for
we should replace 19/4375s [-8 10 -4 -1 0 0 0 1⟩ with 1/575s [6 2 -2 0 0 0 0 0 -1⟩
and for
we should replace 14641k [-17 2 0 0 4⟩ with 143/5k [-8 2 -1 0 1 1⟩
Our recent replacement of 47M with 85/11M for was validated. But it might need to be reconsidered in the light of the information below, because we already had a symbol for an 11:85 comma. for 85/11C.
Because I then found the following very telling cases, where George and I assigned commas that were far less "useful" than the most "useful" in their lowest-precision capture zone, by any conceivable usefulness ranking function based on N2D3P9, AAS and ATE alone, and yet I believe our assignments are entirely justified.
We rightly assigned:
to 19s [-9 3 0 0 0 0 0 1⟩ instead of the more "useful" 5s [-15 8 1⟩.
to 1/17k [-7 7 0 0 0 0 -1⟩ instead of the more "useful" 25/7k [-5 2 2 -1⟩
to 1/19C [-10 9 0 0 0 0 0 -1⟩ instead of the more "useful" 1/25C [11 -4 -2 0 0 0 0 0⟩
to 11/19M [4 -2 0 0 1 0 0 -1⟩ instead of the more "useful" 1/7M [-13 10 0 -1⟩
and there are several others like these.
I see at least two reasons here, not to use the most "useful" (according to the above function, or others like it).
1. The most "useful" is outside the symbol's capture zone at a higher precision level.
2. The most "useful" is a comma for a 2,3-equivalence class that already has a symbol (based a more useful comma for the same equivalence class).
So optimising the above usefulness function(s) based on sum-of-squared-errors in usefulness, instead of a simple count of commas matched, will not be useful. Assignments like those above will skew the result in ways that are not meaningful.
I want to remind us of what we're doing here. The most pressing need at present, is a metric for choosing commas (and/or metacommas) for tina accents. I don't think either of the above numbered reasons are likely to occur in the case of candidate commas for tina accents. So I say the above usefulness metric is Good Enough™ and we should just run with it.
Douglas is testing these candidate functions using his javascript code and I'm doing it in an Excel spreadsheet (after Douglas kindly shared the necessary input data with me). That way we can check each other's results.
The input data consists of the existing default comma for every valid single-shaft sagittal symbol (including accents) up to the half-apotome, and for each such symbol, all the reasonable candidate commas that fall within its capture zone, at the lowest JI precision level at which that symbol occurs. "Reasonable" here means having N2D3P9 ≤ 307, AAS ≤ 14, ATE ≤ 15.
So far, the best such usefulness-ranking function I have found (based on maximising the number of existing commas that it ranks as the most useful in their zone), is:
usefulness_rank = lb(N2D3P9) + 1/12 × AAS1.37 + 2^(ATE-10.5)
But what was at first rather depressing, was that even this best (so far) function only validates 91 of the 123, or 74%, of the existing symbol commas. Did George and I really do such a bad job?
There are certainly 2 commas (for some rarely-used symbols) where we did a bad job. That is, we assigned commas that were far less useful than the most useful comma in the capture zone. These were identified earlier, based on their N2D3P9 values alone, as follows:
I believe for
we should replace 19/4375s [-8 10 -4 -1 0 0 0 1⟩ with 1/575s [6 2 -2 0 0 0 0 0 -1⟩
and for
we should replace 14641k [-17 2 0 0 4⟩ with 143/5k [-8 2 -1 0 1 1⟩
Our recent replacement of 47M with 85/11M for was validated. But it might need to be reconsidered in the light of the information below, because we already had a symbol for an 11:85 comma. for 85/11C.
Because I then found the following very telling cases, where George and I assigned commas that were far less "useful" than the most "useful" in their lowest-precision capture zone, by any conceivable usefulness ranking function based on N2D3P9, AAS and ATE alone, and yet I believe our assignments are entirely justified.
We rightly assigned:
to 19s [-9 3 0 0 0 0 0 1⟩ instead of the more "useful" 5s [-15 8 1⟩.
to 1/17k [-7 7 0 0 0 0 -1⟩ instead of the more "useful" 25/7k [-5 2 2 -1⟩
to 1/19C [-10 9 0 0 0 0 0 -1⟩ instead of the more "useful" 1/25C [11 -4 -2 0 0 0 0 0⟩
to 11/19M [4 -2 0 0 1 0 0 -1⟩ instead of the more "useful" 1/7M [-13 10 0 -1⟩
and there are several others like these.
I see at least two reasons here, not to use the most "useful" (according to the above function, or others like it).
1. The most "useful" is outside the symbol's capture zone at a higher precision level.
2. The most "useful" is a comma for a 2,3-equivalence class that already has a symbol (based a more useful comma for the same equivalence class).
So optimising the above usefulness function(s) based on sum-of-squared-errors in usefulness, instead of a simple count of commas matched, will not be useful. Assignments like those above will skew the result in ways that are not meaningful.
I want to remind us of what we're doing here. The most pressing need at present, is a metric for choosing commas (and/or metacommas) for tina accents. I don't think either of the above numbered reasons are likely to occur in the case of candidate commas for tina accents. So I say the above usefulness metric is Good Enough™ and we should just run with it.