## developing a notational comma popularity metric

Dave Keenan
Posts: 1095
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

### Re: developing a notational comma popularity metric

Great work! I hope you're taking that out to 5298.2, and not stopping at 3501.

And I hope you're recording the gpf, N2 and N2P (or N2P9) along with each numerator, so you can create the two sorted lists of numerators that will let you find the denominators. Or maybe that's all irrelevant now. Maybe we should just sequentially try denominators in a similar way to numerators.

Bed time for me.

cmloegcmluin
Posts: 794
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer
Contact:

### Re: developing a notational comma popularity metric

Oops, no I didn't ask for up to 5298.2. I missed that bit. Well, then it certainly won't find anything else. But it also is holding any results hostage until it figures that out. And to speed it up I turned off logging for negative results, and so since there haven't been any positive results since before I went to bed, I'm beginning to suspect my estimation was off if it takes significantly longer to factorize numbers the higher up it goes. And I'm too anxious to get cracking on other work on the code and this running script is kind of impeding that. So I'll just cancel it for now and resume at a more appropriate time.

cmloegcmluin
Posts: 794
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer
Contact:

### Re: developing a notational comma popularity metric

Whoops. Looks like I never finished up on the sub-topic of pre-calculating sorted numerators to help deepen our list of top 2,3-free classes by N2D3P9. I'll return to that later.

Since it's been over a month now, I feel like I should really get back to work on the comma usefulness metric.

I am still working hard on hammering the existing code into shape, but I reached a stopping point of sorts this morning and took a step back and saw that now that I've sorted out what the code had fundamentally wrong about comma classes (namely, it was including them up to the largest single-shaft symbol, when really anything beyond the half-apotome is dependent on / a mirror of what is below the half-apotome), pretty much anything that remains should not significantly affect this project. And since I'm feeling a bit stuck / lacking inspiration how to proceed on the cleanup, it's another good reason to return here.
Dave Keenan wrote:
Fri Sep 11, 2020 5:43 pm
The set of functions can be summarised as:
compress(N2D3P9) + t × expand1(ATE) + s × expand2(AAS)

We would try various functions for "compress" such as lb(N2D3P9) and sqrt(N2D3P9) and perhaps a parameterised N2D3P9^a where 0<a≤1/2. And the inverse of those functions would be candidates for "expand1" and "expand2".
So I've gotten back into the headspace of what needs to be done. But I perhaps never had gotten myself 100% clarity on what functions you wanted to try.

I do understand how we started with N2D3P9 × 2^(2^(ATE-DATE)) × 2^(2^(AAS-DAAS)), but geared down a multiplicative order without affecting ranking by taking lb of everything. But I do have a few questions:
1. What is the meaning of s and t, why have DATE and DAAS been removed, and are those related? I could imagine that the gearing down of a multiplicative order would allow use to convert 2^(DATE-ATE) to t × 2^(ATE), but I can't quite understand exactly how or why. It's possible you just meant to write DATE-ATE, or that you were merely abbreviating; it's also possible that s and t were just meant to be constants somewhere in the range of 0.5-1.5 or so (while additionally keeping DATE and DAAS as parameters to be varied somewhere in the vicinity of 9).
2. Could you please humor me and just unambiguously write out the full equation when the compress option is sqrt(N2D3P9), or even better N2D3P9^a where 0<a≤1/2? I at least understand that the inverse would be squaring ATE and AAS, but I'm unclear about whether in this mode we'd still be multiplying the three terms together or adding them, and whether any of the 2^'s on the ATE and AAS would be gone or not. I tried working out a few things on paper but I can't convince myself which way is right. I wish I was better at the basic maths.
In the meantime I'm just collecting the commas, their n2d3p9, AAS, and ATE. This will be the raw material for the thingamajig to minimize across, once I know what stuff to try.

What do you think of the default search parameters for the commas it'll check, though? N2D3P9: 307, ATE: 15, AAS: 14. Just the default ones. Since we're looking to minimize things, these settings shouldn't matter that much, i.e. it's unlikely that any comma with a big N2D3P9, ATE, or AAS would be likely to be the most useful comma in a given zone. If anything I could maybe reduce the search scope with respect to these parameters. Although actually the whole thing runs in like 30 minutes (less time than it took me to write this post anyway, ha) so it's not like we really need to speed this part of the process up. For what it's worth, these settings result in the mina-sized zones each returning 5 to 10 candidate commas.

Speaking of zones, I was thinking: so I suggested we weigh results by the size of the capture zones, in an effort to give more importance to the results for the lower precision levels. Or perhaps I should call them the "simpler, more popular notations" in this case. But it occurs to me that the capture zones are occasionally a bit arbitrary and idiosyncratic, specifically, based on their chosen primary commas' means. So maybe we should be using the average size of the capture zones at a given level? But then I got to thinking about what the point of this exercise was. We're not really trying to evaluate the JI notation directly. We're just using it as a guide to hone/verify parameters to this usefulness metric. If we were evaluating it directly, I suppose we might want to go the complete other direction and instead weigh results by something more like the percentage of the total area across all precision levels in the JI precision levels diagram (up to the half apotome) which a comma takes up. But since we're not doing that, maybe the quicker and dirtier capture zone proportioning suggestion I made which you've already enthusiastically agreed to would be plenty appropriate. But we never really discussed the suggestion or considered alternatives, so I just wanted to make sure we were taking the correct approach.

Dave Keenan
Posts: 1095
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

### Re: developing a notational comma popularity metric

Just a thort:

n/d(p1, p2)
as a shorthand notation for
{n/d ×p1e1×p2e2×p3e3... | e1,e2,e3... ∈ , p1,p2,p3... ∈ }

e.g. 7/5(2, 3) as shorthand for
{7/5 ×2n×3m | n,m ∈ }

I wonder if it would be better with curly braces instead of parentheses, to better convey the idea that it represents a set or class. i.e.

7/5{2, 3}

or perhaps

{7/5}2, 3

7/5(2, 3)

cmloegcmluin
Posts: 794
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer
Contact:

### Re: developing a notational comma popularity metric

I like the curlies. I actually had this thought recently too but lost it in whatever flurry of activity I was amidst at the time! Glad you thort of it too.

Although I was actually going to take it even a step further and suggest we use a colon to indicate that it’s undirected, like this:

{5:7}(2, 3)

Dave Keenan
Posts: 1095
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

### Re: developing a notational comma popularity metric

cmloegcmluin wrote:
Sat Oct 24, 2020 4:22 am
Whoops. Looks like I never finished up on the sub-topic of pre-calculating sorted numerators to help deepen our list of top 2,3-free classes by N2D3P9. I'll return to that later.
Sure. Forget about that for now. The following is more important.
Since it's been over a month now, I feel like I should really get back to work on the comma usefulness metric.
Much appreciated.
I am still working hard on hammering the existing code into shape, but I reached a stopping point of sorts this morning and took a step back and saw that now that I've sorted out what the code had fundamentally wrong about comma classes (namely, it was including them up to the largest single-shaft symbol, when really anything beyond the half-apotome is dependent on / a mirror of what is below the half-apotome),
Good work.
Dave Keenan wrote:
Fri Sep 11, 2020 5:43 pm
The set of functions can be summarised as:
compress(N2D3P9) + t × expand1(ATE) + s × expand2(AAS)

We would try various functions for "compress" such as lb(N2D3P9) and sqrt(N2D3P9) and perhaps a parameterised N2D3P9^a where 0<a≤1/2. And the inverse of those functions would be candidates for "expand1" and "expand2".
So I've gotten back into the headspace of what needs to be done. But I perhaps never had gotten myself 100% clarity on what functions you wanted to try.

I do understand how we started with N2D3P9 × 2^(2^(ATE-DATE)) × 2^(2^(AAS-DAAS)), but geared down a multiplicative order without affecting ranking by taking lb of everything.
Right. That gave lb(N2D3P9) + 2^(ATE-DATE) + 2^(AAS-DAAS)
But I do have a few questions:

• What is the meaning of s and t, why have DATE and DAAS been removed, and are those related? I could imagine that the gearing down of a multiplicative order would allow use to convert 2^(DATE-ATE) to t × 2^(ATE), but I can't quite understand exactly how or why. It's possible you just meant to write DATE-ATE, or that you were merely abbreviating; it's also possible that s and t were just meant to be constants somewhere in the range of 0.5-1.5 or so (while additionally keeping DATE and DAAS as parameters to be varied somewhere in the vicinity of 9).
Yes, they are related. I was not merely abbreviating. I intentionally eliminated DATE and DAAS to allow greater generality in the compress and expand functions.

Your problem may be that you have transcribed it wrongly as 2^(DATE-ATE) when it was actually 2^(ATE-DATE).

2^(ATE-DATE)
= 2^ATE / 2^DATE
= 1/(2^DATE) × 2^ATE
= t × 2^ATE where t = 1/(2^DATE)

If DATE = 9 then t = 1/512 ≈ 0.00195
• Could you please humor me and just unambiguously write out the full equation when the compress option is sqrt(N2D3P9), or even better N2D3P9^a where 0<a≤1/2? I at least understand that the inverse would be squaring ATE and AAS, but I'm unclear about whether in this mode we'd still be multiplying the three terms together or adding them, and whether any of the 2^'s on the ATE and AAS would be gone or not. I tried working out a few things on paper but I can't convince myself which way is right. I wish I was better at the basic maths.
If compress(x) = sqrt(x), expand1(x) = x2 and expand2(x) = x2 then

compress(N2D3P9) + t × expand1(ATE) + s × expand2(AAS)
= sqrt(N2D3P9) + t × ATE2 + s × AAS2

If compress(x) = xa where a<1, expand1(x) = xb where b>1 and expand2(x) = xc where c>1 then

compress(N2D3P9) + t × expand1(ATE) + s × expand2(AAS)
= N2D3P9a + t × ATEb + s × AASc

Note that expand1 and expand2 don't need to be the inverse of the specific compress function that's used in any given case. expand1 and expand2 don't need to be the same. For example you could have:

N2D3P9a + t × 2^ATE + s × 2^AAS
or
lb(N2D3P9) + t × ATEb + s × AASc
or
lb(N2D3P9) + t × ATEb + s × 2^AAS
or
N2D3P9a + t × 2^ATE + s × AASc
In the meantime I'm just collecting the commas, their n2d3p9, AAS, and ATE. This will be the raw material for the thingamajig to minimize across, once I know what stuff to try.
Cool.
What do you think of the default search parameters for the commas it'll check, though? N2D3P9: 307, ATE: 15, AAS: 14. Just the default ones. Since we're looking to minimize things, these settings shouldn't matter that much, i.e. it's unlikely that any comma with a big N2D3P9, ATE, or AAS would be likely to be the most useful comma in a given zone. If anything I could maybe reduce the search scope with respect to these parameters. Although actually the whole thing runs in like 30 minutes (less time than it took me to write this post anyway, ha) so it's not like we really need to speed this part of the process up. For what it's worth, these settings result in the mina-sized zones each returning 5 to 10 candidate commas.
That sounds fine.
Speaking of zones, I was thinking: so I suggested we weigh results by the size of the capture zones, in an effort to give more importance to the results for the lower precision levels. Or perhaps I should call them the "simpler, more popular notations" in this case. But it occurs to me that the capture zones are occasionally a bit arbitrary and idiosyncratic, specifically, based on their chosen primary commas' means. So maybe we should be using the average size of the capture zones at a given level? But then I got to thinking about what the point of this exercise was. We're not really trying to evaluate the JI notation directly. We're just using it as a guide to hone/verify parameters to this usefulness metric. If we were evaluating it directly, I suppose we might want to go the complete other direction and instead weigh results by something more like the percentage of the total area across all precision levels in the JI precision levels diagram (up to the half apotome) which a comma takes up. But since we're not doing that, maybe the quicker and dirtier capture zone proportioning suggestion I made which you've already enthusiastically agreed to would be plenty appropriate. But we never really discussed the suggestion or considered alternatives, so I just wanted to make sure we were taking the correct approach.
KISS

Maybe start by doing only the Extreme precision.

Dave Keenan
Posts: 1095
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

### Re: developing a notational comma popularity metric

cmloegcmluin wrote:
Sat Oct 24, 2020 1:06 pm
Although I was actually going to take it even a step further and suggest we use a colon to indicate that it’s undirected, like this:

{5:7}(2, 3)
Good point about it being undirected. Would we ever want it directed?

With the colon, I'd have to write "5:1" or "1:5", and I like to be able to write just "5".

And I don't think we need two set of brackets. I think we only need one set (curly), around either the ratio or the list of subscripted primes, not both. We previously agreed to have it only around the subscripts (when it was round brackets). But I don't mind having it around the ratio instead, if that's what you prefer, now that it's curly.

n/d{p1,p2...}

as a shorthand notation for

{(n/d)e0 × p1e1 × p2e2... | (n/d) ∈ +, e0 ∈ {-1,+1}, e1,e2... ∈ , p1,p2... ∈ }

Then it has the same undirected meaning whether or not you use a colon, and I can write 5{2,3} with 1/5 being included.

cmloegcmluin
Posts: 794
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer
Contact:

### Re: developing a notational comma popularity metric

Dave Keenan wrote:
Sat Oct 24, 2020 1:28 pm
cmloegcmluin wrote:
Sat Oct 24, 2020 4:22 am
Whoops. Looks like I never finished up on the sub-topic of pre-calculating sorted numerators to help deepen our list of top 2,3-free classes by N2D3P9. I'll return to that later.
Sure. Forget about that for now. The following is more important.
Well, it was barely any work to queue it up with the corrected max N2D3P9 of 5298.2, so I had already kicked it off last night. It crashed just after I woke up, and for a perfectly foreseeable reason: "Error: This integer 8368831 contains primes which are too big." I had to cut off the code's list of primes somewhere, and I happened to cut it off at 8368819. I suppose I could go up to 10 million. Then I could finish the script, which needs to reach 9765625. Anyway, I can pick up from where it crashed overnight tonight.

By the way, the biggest numerator within the N2D3P9 range it found so far this time was quite similar to last time. Last time it reported 1953125, which is 59. This time it reported 2734375 which is 587. It would have reported 2734375 last time had the max N2D3P9 been set properly to 5298.2 rather than 3501. Again, the script will be truly done once it reaches 9765625, which = 510. 5811 = 4296875 won't make the cut with an N2D3P9 of 10257.3, but it's still possible there's a result out there.

Your problem may be that you have transcribed it wrongly as 2^(DATE-ATE) when it was actually 2^(ATE-DATE).
Baaah. Yes that was the problem. Thanks for working it out for me. I'm not sure whether I'm more disappointed in my failure to reason my way out of the transcription error, or the lack of care in the original transcription error
N2D3P9a + t × 2^ATE + s × 2^AAS
or
lb(N2D3P9) + t × ATEb + s × AASc
or
lb(N2D3P9) + t × ATEb + s × 2^AAS
or
N2D3P9a + t × 2^ATE + s × AASc
Okay, got it. I did understand that the numbers in the fns for ATE and AAS were independent, but I didn't quite understand that the operations were independent from each other (one could be a power fn, other could be an exponentiation fn) and independent from the inverse version for the N2D3P9. So: we would never in the same equation raise 2 to the AAS and raise AAS to the c; those are two alternative fns. And it'll always be addition separating the three terms. And here's an exhaustive list of the 23 possible forms

lb(N2D3P9) + t × 2^ATE + s × 2^AAS
N2D3P9a + t × 2^ATE + s × 2^AAS
lb(N2D3P9) + t × ATEb + s × 2^AAS
N2D3P9a + t × ATEb + s × 2^AAS
lb(N2D3P9) + t × 2^ATE + s × AASc
N2D3P9a + t × 2^ATE + s × AASc
lb(N2D3P9) + t × ATEb + s × AASc
N2D3P9a + t × ATEb + s × AASc

And I'll start s and t out in the vicinity of 0.00195 then.
KISS

Maybe start by doing only the Extreme precision.
Word.

That's enough to verify for me that the original plan to go by secondary comma zones should suffice. It's super easy to implement it that way since I already have the secondary comma zones ready to go, because those are the zones I'm searching for competing commas. In fact, now that I articulate it that way, the original plan's logic comes into focus.

Dave Keenan wrote:
Sat Oct 24, 2020 3:22 pm
cmloegcmluin wrote:
Sat Oct 24, 2020 1:06 pm
Although I was actually going to take it even a step further and suggest we use a colon to indicate that it’s undirected, like this:

{5:7}(2, 3)
Good point about it being undirected. Would we ever want it directed?
I don't think so, not with the way N2D3P9 was designed to always want n ≥ d.
With the colon, I'd have to write "5:1" or "1:5", and I like to be able to write just "5".
Okay, that's a good counter-point. The code does not currently drop denominators of 1 when formatting for output quotients (ratios) such as ones that appear in 2,3-free classes. But it should drop them, and I've taken a note to make it do that. And I'll keep it how it is with respect to using slashes rather than colons for 2,3-free classes.
And I don't think we need two set of brackets. I think we only need one set (curly), around either the ratio or the list of subscripted primes, not both. We previously agreed to have it only around the subscripts (when it was round brackets). But I don't mind having it around the ratio instead, if that's what you prefer, now that it's curly.
At the time I pushed for round brackets around the subscripted primes, I wasn't thinking of them semantically so much as visually. That is, brackets around the ratio seemed less necessary because everything there was glued together with the vinculum, whereas I found it rather disconcerting to have the list of primes in the subscript glued together by only a comma (you know, the punctuation kind, not the musical kind, ), or worse, floating apart with a space after the comma. That's really all I was thinking about at that time. I don't think these thoughts are important; I think they're kind of silly. But anyway, those were the extent of my thoughts at that time.

Now that we're getting serious and considering the semantics of the punctuation, such as how curlies suggest classes or sets, I think I do prefer it around the ratio, since that's the 2,3-free class. I suppose the subscripted list of primes is a set, but it doesn't feel like it's the primary set or class of interest, you know?

I know you don't want brackets around both the ratio and the subscript. Between the two, I would be more okay with dropping the round brackets in the subscript, especially if we don't include spaces after the commas there. Like this:

{n/d}p1,p2...
{7/5}2,3
{5}2,3

That said, I feel like I should defer to you on this matter. I can barely follow the expanded form I'm making suggestions for the shorthand for, while you can apparently not only read it, but you actually produced it. So clearly you're a lot more familiar with these types of expressions.

Oh yeah, I just remembered how I made it so the code can format them to take advantage of the forum's LaTeX support, like so:

$$\{\frac{n}{d}\}_{\scriptsize{p_{1},p_{2}...}}$$
$$\{\frac{7}{5}\}_{\scriptsize{2,3}}$$
$$\{5\}_{\scriptsize{2,3}}$$

Dave Keenan
Posts: 1095
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

### Re: developing a notational comma popularity metric

cmloegcmluin wrote:
Sun Oct 25, 2020 3:07 am
Well, it was barely any work to queue it up with the corrected max N2D3P9 of 5298.2, so I had already kicked it off last night. It crashed just after I woke up, and for a perfectly foreseeable reason: "Error: This integer 8368831 contains primes which are too big." I had to cut off the code's list of primes somewhere, and I happened to cut it off at 8368819. I suppose I could go up to 10 million. Then I could finish the script, which needs to reach 9765625. Anyway, I can pick up from where it crashed overnight tonight.
As I pointed out earlier, you only need to consider prime factors up to 307. Anything with a greater prime factor must have N2D3P9 > 5298.2 because 311/2 * 311/9 = 5373.4.

I suggest you have your prime factoring routine only try primes up to 307 and return the "residue" or "remnant" in addition to a monzo with only 63 exponents. If the residue is greater than 1 then the numerator-finding code can ignore that number.

This will not only fix the bug, but will also speed things up.
By the way, the biggest numerator within the N2D3P9 range it found so far this time was quite similar to last time. Last time it reported 1953125, which is 59. This time it reported 2734375 which is 587. It would have reported 2734375 last time had the max N2D3P9 been set properly to 5298.2 rather than 3501. Again, the script will be truly done once it reaches 9765625, which = 510. 5811 = 4296875 won't make the cut with an N2D3P9 of 10257.3, but it's still possible there's a result out there.
My guess is that there won't be any more, because if there were, they would have to include 5772 = 3 828 125, but its N2D3P9 is 5815.29.

And here's an exhaustive list of the 23 possible forms

lb(N2D3P9) + t × 2^ATE + s × 2^AAS
N2D3P9a + t × 2^ATE + s × 2^AAS
lb(N2D3P9) + t × ATEb + s × 2^AAS
N2D3P9a + t × ATEb + s × 2^AAS
lb(N2D3P9) + t × 2^ATE + s × AASc
N2D3P9a + t × 2^ATE + s × AASc
lb(N2D3P9) + t × ATEb + s × AASc
N2D3P9a + t × ATEb + s × AASc

And I'll start s and t out in the vicinity of 0.00195 then.
Cool.

Now that we're getting serious and considering the semantics of the punctuation, such as how curlies suggest classes or sets, I think I do prefer it around the ratio, since that's the 2,3-free class. I suppose the subscripted list of primes is a set, but it doesn't feel like it's the primary set or class of interest, you know?

I know you don't want brackets around both the ratio and the subscript. Between the two, I would be more okay with dropping the round brackets in the subscript, especially if we don't include spaces after the commas there. Like this:

{n/d}p1,p2...
{7/5}2,3
{5}2,3
That's totally fine with me.

See https://en.wikipedia.org/wiki/Set-builder_notation

cmloegcmluin