developing a notational comma popularity metric

Post by **cmloegcmluin** » Sun Nov 01, 2020 5:06 am

Dave Keenan wrote: ↑Sat Oct 31, 2020 5:42 pm I think that choosing between these and the simpler LP metric (ignores ATE) with b = 1.41, s = 0.096, will come down to looking at exactly which commas they include that the others reject, and vice versa.

Thanks for sharing your latest results. Yes, it does seem like, unlike with the popularity metric LFC, it's way easier to get good metrics, which makes it harder to choose between them. So I'm still working toward being able to confirm your Excel results by adding recursive searching of local minima — you're leagues ahead of me with your pre-built all-purpose evolutionary solver. If I could have my code spit out which commas are matches for each metric, though, how would you actually use that information to make a decision? Which (type of) comma is more important to match?

FYI, during my work on the code, I made a couple decisions about naming that affect my approach to the problem that I should probably share with you:

I'm calling things like LEE and RPP (usefulness) metric families, adding the "families" part to help distinguish them from individual (usefulness) metrics. A metric is what you get when you combine a family with a parameter set such as {b=2, s=1/85, t=2^-10}. We didn't need the concept of a metric family for the popularity metric LFC, but that's because there we had so many different submetrics that it didn't make sense to name families in a top-down fashion; we only named those who proved themselves meaningful by emerging as contenders from the search (like cwyk, wbl, etc.). For the usefulness metric search, the concept of submetrics doesn't seem helpful, so I think we're just looking for the best metrics in each of the 8 metric families, and then probably the best metric among those 8 (well, maybe a few more, if we start including LP, LE, RP, and RE (I'm assuming in all cases dropping ATE in favor of AAS, although I'm still in suspense about you digging up the details about you and George's discussions between which of them to use, or both, in a usefulness metric)).
When DRYing up the duplicated code for the boolean and sos modes, I realized that to bring them into maximum conformity with each other in order to extract their shared behavior, it made more sense to treat boolean mode not as an effort to maximize match count, but to minimize non-match count. In other words, sos mode assigns each zone either a score of zero if the actual comma is the most useful in its zone and otherwise a non-zero score (a squared distance; more on that in a moment). Similarly, boolean mode should assign each zone either a score of zero if the actual comma is the most useful in its zone and otherwise a flat score of one. Therefore, I suggest we henceforth invert the way we talk about boolean mode scores, instead giving the count of non-matches. Even if we never think about the sos mode again, I think this is preferable.
I added nominal types to the numbers I was tossing around in the code to help me make sense of this refactor, which helped me realize I was confronting not one type of score, but two! And both of them are inverted scores, i.e. where a lower score is better. The first type of score is a usefulness score, and this is the type of score returned by metrics such as those in the families LPE, RPE, etc. and which is used to compare commas within a given zone. The second type of score is a metric score. In boolean mode, this score will be an integer, representing the count of non-matches for a given metric across all the comma zones; in sos mode, this score represents a gradient of non-matchy-ness for a given metric across all the comma zones. Anyway, we couldn't have gotten this far if we didn't understand that this how things work on some level, and maybe you'd already consciously made this distinction in your head and felt it was too obvious to unpack, but in my case I was struggling a bit sometimes making sense of boundaries between modules in the code until I arrived at this insight. Hopefully it's helpful, whether it affects how you plug things into Excel, or just for our communication here on the forum.

With a bit of hesitation, I'm going to include here a couple late night mind drifty thoughts I had:

So we're planning to mix in a badness metric in with the popularity and usefulness metrics to get a grand unified metric for choosing tina commas, right. And we need to balance this badness metric's effects properly against the popularity and usefulness metrics. Balancing usefulness and popularity is really what we're doing right now. So I had a thought that we could take a similar approach as we're taking for balancing usefulness and popularity, namely, we could examine the existing Extreme notation, and find the average ina error. In the Extreme notation it's mina error and in the Insane precision it's tina error, but it's the same principle so we can compare them. You had some detailed thoughts about the exact nature of this error measure earlier: "Distance from the centre of the capture zone (or nearness to its edges)"; we could even go with a combo of both. Whatever this average error is exactly, the next question is, what to do with it? I'm thinking something like this: if a tina comma candidate has the same tina error as the average mina error in the established Extreme notation, then theoretically we should have no reason to prefer or not prefer said tina comma candidate on the basis of error, so it would be at that point where the badness metric does not affect the score, and otherwise it either helps or hurts the score as some function of the proportion of the tina error to the average mina error. Make sense? This also gets me thinking, though, that if we're doing work to balance usefulness against popularity, might this work be rendered moot when we throw badness on top of all these spinning plates too? In other words, should we just start now, already integrating badness into the metric, balancing all three together at the same time? I'm starting to see how usefulness and badness share an important aspect which is different than the popularity metric, which is that we don't have objective data to calibrate against outside of Sagittal itself, as we did with the popularity metric in the form of the Scala usage stats. So mightn't we as well do this step in one arbitrary-ish go, rather than two needlessly separate arbitrary-ish goes? I mean, couldn't badness/error be part of the explanation for why some commas got chosen despite not being the most useful, i.e. a third entry to your list of reasons you started here, and therefore we should be compelled to start balancing with it now, rather than rejecting commas which "mysteriously" don't conform to usefulness?
What if we used the infrastructure we've got going for combing over zones of the half-apotome and reporting all the commas in each one which meet our default criteria (N2D3P9 ≤ 307, AAS ≤ 14, ATE ≤ 15), but instead scanned over each of the 404 tinas up to the half-apotome (minus the 123 which are already occupied by Olympian symbols) and reported the "best" comma in each, best by whatever metric we come up with. For each tina, there will be a few possible symbols we could use for them, since the ±9.5 tina range for cores overlap quite a bit; for each of these possible symbols, we consider the metacommas to get to these best commas. Then we just see what metacommas are the most popular for each tina size, and choose on the basis of those. This is slightly different than the metacomma concept we briefly looked at over on the Magratheans topic, which were between 2,3-free classes exactly notated by the Extreme notation already and those not yet exactly notated; this is between commas in the Extreme notation and commas which we may well want to choose as the primary commas for the remaining zones in the Insane precision level notation. I had previously done a scan of all the metacommas between commas in Extreme and Ultra, and you were surprised by their variety (I think there were 25 of them, way more than just the 455n and 65:77n; yes here it is). So I'm sure we'll have an even greater variety in the Insane notation. But I note that 455n and 65:77n constitute about a third of all the mina values, so we can probably find some strong tina value majorities following this method.

Post by **Dave Keenan** » Sun Nov 01, 2020 7:44 am

cmloegcmluin wrote: ↑Sun Nov 01, 2020 5:06 am If I could have my code spit out which commas are matches for each metric, though, how would you actually use that information to make a decision? Which (type of) comma is more important to match?

It would be highly subjective. I guess I was just hoping a few of them might be obvious enough to let us choose between usefulness metrics.

I think your idea of forging on to a badness metric for the extreme precision level, is better.

But you seem to have the terminology confused. Badness combines complexity and error. The complexity metric is what we've been working on up until now — what we've been calling a "usefulness" metric, which is really a uselessness metric. We probably should have stuck with calling it complexity, as George and I did. George's complexity measure that combines SOPFR with ASS and ATE is described earlier in this thread. I'm not planning on searching for discussions on that. I'm pretty sure George came up with it all by himself, and I simply accepted it.

You say, "Balancing usefulness and popularity is really what we're doing right now". I don't see it that way at all. Balancing unpopularity with slope and 3-exponent (to obtain uselessness) is one way of describing what we're doing now. But now that you mention it. ASS and ATE could be described as kinds of notational uselessness. So we could say we're combining unpopularity and (redefined) uselessness to obtain complexity. And now we're about to balance complexity with error to obtain badness.

Your other naming ideas above are fine with me. And it's fine to talk in terms of minimising non-match count instead of maximising match count, if that's simpler for your code, since they are entirely equivalent.

Your idea of examining the 404 tina zones seems to suffer from a kind of chicken and egg problem. How will you decide which commas should be considered for each zone, in order to generate the metacommas to be counted?

Post by **cmloegcmluin** » Sun Nov 01, 2020 10:49 am

Dave Keenan wrote: ↑Sun Nov 01, 2020 7:44 am
cmloegcmluin wrote: ↑Sun Nov 01, 2020 5:06 am If I could have my code spit out which commas are matches for each metric, though, how would you actually use that information to make a decision? Which (type of) comma is more important to match?
It would be highly subjective. I guess I was just hoping a few of them might be obvious enough to let us choose between usefulness metrics.

I think your idea of forging on to a badness metric for the extreme precision level, is better.

They're not mutually exclusive, considering the individual commas, and bringing in badness now. But I can see how they're related insofar as they're both attempts to get us closer to a final metric.

And I'm glad that in general you're amenable to the suggestions I made w/r/t moving on to considering badness now.

But you seem to have the terminology confused. Badness combines complexity and error. The complexity metric is what we've been working on up until now — what we've been calling a "usefulness" metric, which is really a uselessness metric. We probably should have stuck with calling it complexity, as George and I did.

You say, "Balancing usefulness and popularity is really what we're doing right now". I don't see it that way at all. Balancing unpopularity with slope and 3-exponent (to obtain uselessness) is one way of describing what we're doing now. But now that you mention it. ASS and ATE could be described as kinds of notational uselessness. So we could say we're combining unpopularity and (redefined) uselessness to obtain complexity. And now we're about to balance complexity with error to obtain badness.

I recognize that I've had trouble keeping straight all of these different metric types throughout the course of this work, so I'm sorry for that.

If I understood you correctly, we could then say:

name	type
SoPF>3, N2D3P9	unpopularity
AAS, ATE	uselessness
LEE, RPP, etc.	complexity = uselessness + unpopularity
TBD	badness = error + complexity = error + uselessness + unpopularity

Apologies for any historical confusion this causes, but this helps me understand it if this works for you.

I'm gonna laugh if somehow the acronym for our final metric actually comes out to TBD... like, Tina Badness Decider or something

George's complexity measure that combines SOPFR with ASS and ATE is described earlier in this thread. I'm not planning on searching for discussions on that. I'm pretty sure George came up with it all by himself, and I simply accepted it.

Yes, I'm familiar with it: https://github.com/Sagittal/sagittal-ca ... plexity.ts
I may need to rename the folder it lives in to "complexity" now though.

Your other naming ideas above are fine with me. And it's fine to talk in terms of minimising non-match count instead of maximising match count, if that's simpler for your code, since they are entirely equivalent.

Okay, great. Yes, that's right, entirely equivalent.

Your idea of examining the 404 tina zones seems to suffer from a kind of chicken and egg problem. How will you decide which commas should be considered for each zone, in order to generate the metacommas to be counted?

I don't think there's a chicken and egg problem. I didn't really articulate it clearly. I think the key thing I failed to get across is that this is not an alternative method for finding a badness metric, but another fun step I invented for us to slog through once we manage to find it!

Let me try again:

Rather than just using the badness metric against each zone up to 9.5 tinas and be done with whatever commas it turns up there, we use it against each zone in the entire half apotome to find the best comma in each of those, and use every metacomma between any one of those and any Extreme comma, as long as its sub-9.5 tinas, and then look in each tina-sized bucket, and for each bucket whichever comma occurs the most should be that tina's primary comma.

We really don't have to worry about this suggestion now, or at all if you don't like it. That's why I disclaimed I was hesitant to share it. I think it does make sense, but maybe not worth the trouble.

Post by **Dave Keenan** » Sun Nov 01, 2020 11:43 am

cmloegcmluin wrote: ↑Sun Nov 01, 2020 10:49 am They're not mutually exclusive, considering the individual commas, and bringing in badness now. But I can see how they're related insofar as they're both attempts to get us closer to a final metric.

And I'm glad that in general you're amenable to the suggestions I made w/r/t moving on to considering badness now.

Actually. I've changed my mind on that. Sorry. The most urgent requirement is to decide comma definitions for 0.5 tina and 1, 2, 3 ... 9 tinas, for an upcoming SMuFL update. Since we now have a complexity (nee usefulness) metric (in fact a plethora of good ones). I'd like to try once again to short-cut the choice method for tinas.

Just maybe the complexities of the tina candidates will make the choices so obvious that we needn't bring error into it.

It's valid to think about what the most common metacommas are, but we still have to decide what comma each tina accent will represent when against a bare shaft.

If I understood you correctly, we could then say:

name type

SoPF>3, N2D3P9 unpopularity

AAS, ATE uselessness

LEE, RPP, etc. complexity = uselessness + unpopularity

TBD badness = error + complexity = error + uselessness + unpopularity

Apologies for any historical confusion this causes, but this helps me understand it if this works for you.

That works for me. It is consistent with all but the very recent history in this thread.

I'm gonna laugh if somehow the acronym for our final metric actually comes out to TBD... like, Tina Badness Decider or something

Ha!

Post by **cmloegcmluin** » Sun Nov 01, 2020 12:16 pm

Alright. So integrating what I understand from a recent email with what we have here, my plan of attack next time I get a working day will be:

For each tina bucket (range of 0.25-0.75 tinas for the half tina, 0.75-1.5 tinas for the 1 tina, and ±0.5 tinas for each of the remaining tinas 2 through 9), find all commas with N2D3P9 < 5298.2. This will requite a refactor of my code to enable it to take advantage of the recent sorted numerators list we installed, so it can accomplish these searches without crashing.
Then for each of these comma, check its complexity against each of the handful of good metrics you've recently shared (think there's 3 to 5 of them or so). This should be pretty straightforward.
Share out a table on the forum with those results, and hope/expect that the choice within each bucket will be obvious without bringing error (and thereby badness) into the picture.

Please correct me if I got any this wrong.

Post by **Dave Keenan** » Sun Nov 01, 2020 12:53 pm

You can postpone the choice of bucket widths by simply generating all commas in the range 0 to 9.5 tinas (1.335 cents), sorting them by size, and giving their size in tinas to 2 decimal places.

In that range, find all commas with N2D3P9 < 5298.2 and ATE <= 15.

cmloegcmluin wrote:This will require a refactor of my code to enable it to take advantage of the recent sorted numerators list we installed, so it can accomplish these searches without crashing.

I don't understand. Why use the numerators list? Why not use the ratios_2,3 list that you already generated from that? That ratios list can be used for any N2D3P9 limit from now on.

Then for each of these comma, check its complexity against each of the handful of good metrics you've recently shared

I'd say, "Then for each of these commas, give its complexity according to each of the handful of good metrics you've recently shared".

A minimal set of columns would be: size in tinas, name, ratio, monzo, N2D3P9, AAS, complexity1, complexity2, ...

Post by **cmloegcmluin** » Sun Nov 01, 2020 12:56 pm

Dave Keenan wrote: ↑Sun Nov 01, 2020 12:53 pm I don't understand. Why use the numerators list? Why not use the ratios list that you already generated from that? That ratios list can be used for any N2D3P9 limit from now on.

Sorry. That's what I meant. This was a case of synecdoche gone awry.

A minimal set of columns would be: size in tinas, name, ratio, monzo, N2D3P9, AAS, complexity1, complexity2, ...

Makes sense.

Post by **cmloegcmluin** » Sun Nov 01, 2020 1:46 pm

This doesn't include size in tinas or the complexity metric results, but I'm out of time for tonight and won't have time to work on Sagittal tomorrow. But the "find-commas" part of the process is the part you can't do easily without me, so hopefully this is good enough for you for now! Again, too much data to share as a table in a forum post, but have attached it here as a spreadsheet.

Post by **cmloegcmluin** » Sun Nov 01, 2020 1:47 pm

Haha, the first comma in here bigger than unison is already in the Xen wiki as the "chalmersima"

Post by **Dave Keenan** » Sun Nov 01, 2020 2:55 pm

cmloegcmluin wrote: ↑Sun Nov 01, 2020 1:46 pm This doesn't include size in tinas or the complexity metric results, but I'm out of time for tonight and won't have time to work on Sagittal tomorrow. But the "find-commas" part of the process is the part you can't do easily without me, so hopefully this is good enough for you for now! Again, too much data to share as a table in a forum post, but have attached it here as a spreadsheet.

Oh man! That's awesome. Thanks! I've just quickly used it to try the LP (b=1.41, s=.096) complexity metric on the 0.5 tina candidates, and I can already see that we can't avoid using a badness metric of some kind, because the lowest LP-complexity occurs for a comma that is 0.3 tinas, the next lowest is 0.73 tinas and the one I'm rooting for at 0.58 tinas has the third lowest complexity.

So I'll go back to the extreme precision level commas and work on fitting some badness measure — i.e. adding some function of mina error to the existing complexity measures.

The Sagittal forum

developing a notational comma popularity metric

Re: developing a notational comma popularity metric

Re: developing a notational comma popularity metric

Re: developing a notational comma popularity metric

Re: developing a notational comma popularity metric

Re: developing a notational comma popularity metric

Re: developing a notational comma popularity metric

Re: developing a notational comma popularity metric

Re: developing a notational comma popularity metric

Re: developing a notational comma popularity metric

Re: developing a notational comma popularity metric