Thanks for sharing your latest results. Yes, it does seem like, unlike with the popularity metric LFC, it's way easier to get good metrics, which makes it harder to choose between them. So I'm still working toward being able to confirm your Excel results by adding recursive searching of local minima — you're leagues ahead of me with your pre-built all-purpose evolutionary solver. If I could have my code spit out which commas are matches for each metric, though, how would you actually use that information to make a decision? Which (type of) comma is more important to match?Dave Keenan wrote: ↑Sat Oct 31, 2020 5:42 pm I think that choosing between these and the simpler LP metric (ignores ATE) with b = 1.41, s = 0.096, will come down to looking at exactly which commas they include that the others reject, and vice versa.
FYI, during my work on the code, I made a couple decisions about naming that affect my approach to the problem that I should probably share with you:
- I'm calling things like LEE and RPP (usefulness) metric families, adding the "families" part to help distinguish them from individual (usefulness) metrics. A metric is what you get when you combine a family with a parameter set such as {b=2, s=1/85, t=2^-10}. We didn't need the concept of a metric family for the popularity metric LFC, but that's because there we had so many different submetrics that it didn't make sense to name families in a top-down fashion; we only named those who proved themselves meaningful by emerging as contenders from the search (like cwyk, wbl, etc.). For the usefulness metric search, the concept of submetrics doesn't seem helpful, so I think we're just looking for the best metrics in each of the 8 metric families, and then probably the best metric among those 8 (well, maybe a few more, if we start including LP, LE, RP, and RE (I'm assuming in all cases dropping ATE in favor of AAS, although I'm still in suspense about you digging up the details about you and George's discussions between which of them to use, or both, in a usefulness metric)).
- When DRYing up the duplicated code for the boolean and sos modes, I realized that to bring them into maximum conformity with each other in order to extract their shared behavior, it made more sense to treat boolean mode not as an effort to maximize match count, but to minimize non-match count. In other words, sos mode assigns each zone either a score of zero if the actual comma is the most useful in its zone and otherwise a non-zero score (a squared distance; more on that in a moment). Similarly, boolean mode should assign each zone either a score of zero if the actual comma is the most useful in its zone and otherwise a flat score of one. Therefore, I suggest we henceforth invert the way we talk about boolean mode scores, instead giving the count of non-matches. Even if we never think about the sos mode again, I think this is preferable.
- I added nominal types to the numbers I was tossing around in the code to help me make sense of this refactor, which helped me realize I was confronting not one type of score, but two! And both of them are inverted scores, i.e. where a lower score is better. The first type of score is a usefulness score, and this is the type of score returned by metrics such as those in the families LPE, RPE, etc. and which is used to compare commas within a given zone. The second type of score is a metric score. In boolean mode, this score will be an integer, representing the count of non-matches for a given metric across all the comma zones; in sos mode, this score represents a gradient of non-matchy-ness for a given metric across all the comma zones. Anyway, we couldn't have gotten this far if we didn't understand that this how things work on some level, and maybe you'd already consciously made this distinction in your head and felt it was too obvious to unpack, but in my case I was struggling a bit sometimes making sense of boundaries between modules in the code until I arrived at this insight. Hopefully it's helpful, whether it affects how you plug things into Excel, or just for our communication here on the forum.
With a bit of hesitation, I'm going to include here a couple late night mind drifty thoughts I had:
- So we're planning to mix in a badness metric in with the popularity and usefulness metrics to get a grand unified metric for choosing tina commas, right. And we need to balance this badness metric's effects properly against the popularity and usefulness metrics. Balancing usefulness and popularity is really what we're doing right now. So I had a thought that we could take a similar approach as we're taking for balancing usefulness and popularity, namely, we could examine the existing Extreme notation, and find the average ina error. In the Extreme notation it's mina error and in the Insane precision it's tina error, but it's the same principle so we can compare them. You had some detailed thoughts about the exact nature of this error measure earlier: "Distance from the centre of the capture zone (or nearness to its edges)"; we could even go with a combo of both. Whatever this average error is exactly, the next question is, what to do with it? I'm thinking something like this: if a tina comma candidate has the same tina error as the average mina error in the established Extreme notation, then theoretically we should have no reason to prefer or not prefer said tina comma candidate on the basis of error, so it would be at that point where the badness metric does not affect the score, and otherwise it either helps or hurts the score as some function of the proportion of the tina error to the average mina error. Make sense? This also gets me thinking, though, that if we're doing work to balance usefulness against popularity, might this work be rendered moot when we throw badness on top of all these spinning plates too? In other words, should we just start now, already integrating badness into the metric, balancing all three together at the same time? I'm starting to see how usefulness and badness share an important aspect which is different than the popularity metric, which is that we don't have objective data to calibrate against outside of Sagittal itself, as we did with the popularity metric in the form of the Scala usage stats. So mightn't we as well do this step in one arbitrary-ish go, rather than two needlessly separate arbitrary-ish goes? I mean, couldn't badness/error be part of the explanation for why some commas got chosen despite not being the most useful, i.e. a third entry to your list of reasons you started here, and therefore we should be compelled to start balancing with it now, rather than rejecting commas which "mysteriously" don't conform to usefulness?
- What if we used the infrastructure we've got going for combing over zones of the half-apotome and reporting all the commas in each one which meet our default criteria (N2D3P9 ≤ 307, AAS ≤ 14, ATE ≤ 15), but instead scanned over each of the 404 tinas up to the half-apotome (minus the 123 which are already occupied by Olympian symbols) and reported the "best" comma in each, best by whatever metric we come up with. For each tina, there will be a few possible symbols we could use for them, since the ±9.5 tina range for cores overlap quite a bit; for each of these possible symbols, we consider the metacommas to get to these best commas. Then we just see what metacommas are the most popular for each tina size, and choose on the basis of those. This is slightly different than the metacomma concept we briefly looked at over on the Magratheans topic, which were between 2,3-free classes exactly notated by the Extreme notation already and those not yet exactly notated; this is between commas in the Extreme notation and commas which we may well want to choose as the primary commas for the remaining zones in the Insane precision level notation. I had previously done a scan of all the metacommas between commas in Extreme and Ultra, and you were surprised by their variety (I think there were 25 of them, way more than just the 455n and 65:77n; yes here it is). So I'm sure we'll have an even greater variety in the Insane notation. But I note that 455n and 65:77n constitute about a third of all the mina values, so we can probably find some strong tina value majorities following this method.