Page 21 of 47

### Re: developing a notational comma popularity metric

Posted: Sat Aug 01, 2020 4:03 am
cmloegcmluin wrote:
Sat Aug 01, 2020 2:01 am
Yes, welcome back @volleo6144 ! Although I feel you've been here the whole time, offering your silent support
Mostly. I guess I was just, like, pretty sure I didn't have much to say that you hadn't already figured out, or something.
Dave Keenan wrote:
Fri Jan 27, -292277022657 6:29 pm
As your example shows, cx with c<1 decreases with increasing x, so it didn't seem like it would be of any use for estimating ratio unpopularity, with x being either the prime p or the repeat-count r. But I suppose it could be, if some other component offset its decrease, to give a net monotonically-increasing convex-upward function.
I was suggesting that cx with c<1 could be used with a negative coefficient, making 5-10-15-20 map to 40-20-10-5 and then to 10-30-40-45. Of course, this means that there's some power of 5 (and of all other primes considered) that will outrank any other individual prime (in this case, it's 55 = 3125), but maybe that's a good thing?

### Re: developing a notational comma popularity metric

Posted: Sat Aug 01, 2020 7:49 am
I'll reply re: the additional exponent / logarithmic operations soon. For now, I'd just like to drop an update that locking a as a base to 2 reduced the time for running the 3-chunk best metric solver from 52 minutes down to 20. Woot! And that was even with re-introducing x (now that the code pulls the plug on searches which are probably just following blue threads of death).

### Re: developing a notational comma popularity metric

Posted: Sat Aug 01, 2020 8:47 am
The universe only needs one log base, because what can be achieved by changing log base can be achieved more simply by multiplying the log-to-the-standard-base by a constant. That constant is the reciprocal of the log-to-the-standard-base of what would otherwise have been the new base. Base e is the "natural" standard in many problem domains, but base 2 is already standard in music theory and makes sense there due to the psychoacoustic phenomenon of octave-equivalence.

### Re: developing a notational comma popularity metric

Posted: Sat Aug 01, 2020 9:56 am
cmloegcmluin wrote:
Sat Aug 01, 2020 2:01 am
Had you tried gpf(nd) you may've availed. Could you try: ...
Grr. Yes, I can now confirm your numbers, and improve them very slightly. See below.
And that I wouldn't want to artificially constrain my code to be capable of only logarithms of base 2.
That is no constraint at all, as per my previous post.
We could convert log(4/3) to lb by multiplying j and k. But then k will be non-1, and thus we'll have an extra chunk.
No we won't, because we will no longer have the log base as a parameter/chunk. We just trade a for k. I don't understand why you even gave k as a parameter, when it was forced to 1.
We could adjust j and k, maintaining the proportion, until k is 1 again, but then the total value from the sopfr will be different,
What sopfr? I thought this metric only had two soapfrs and gpf.
and we'd need a weight on it (or the other term, the gpf) and thus we'd still have the extra chunk.
What is "it" here? AFAIK you already have weights on all the components that are not gpf.

If you force any variable to be 1, then it isn't a a parameter and therefore isn't a chunk.

Given that we've previously standardised on j = 1 in the past, where j is the coefficient of some soapfar of the numerator, then it would make more sense to me to have k and l as the parameters, and name the metric "kl", pronounced kayel.

$$\operatorname{kl}(n,d) = \sum_{p=5}^{p_{max}}\big(\operatorname{lb}{p}×(n_p + k × d_p)\big) + l × \operatorname{gpf}(nd), \text{where }n>=d$$
$$k=0.722866218, l=0.319583198 \text{ gives } SoS=0.006970591, SoS(z=1)=17464.5$$

Excel found this minimum in about 4 seconds, starting from k=1, l=1. But it ran for 30 seconds to give reasonable confidence that there were no lower minima.

This (and your original formulation of it) looks like 4 chunks to me: lb, gpf, k, l.

### Re: developing a notational comma popularity metric

Posted: Sat Aug 01, 2020 10:36 am
volleo6144 wrote:
Sat Aug 01, 2020 4:03 am
I was suggesting that cx with c<1 could be used with a negative coefficient,
Ah. Right. So -cx with c<1. Good point.
making 5-10-15-20 map to 40-20-10-5 and then to 10-30-40-45. Of course, this means that there's some power of 5 (and of all other primes considered) that will outrank any other individual prime (in this case, it's 55 = 3125), but maybe that's a good thing?
We won't know until we try. Can you suggest a complete metric involving this?

### Re: developing a notational comma popularity metric

Posted: Sat Aug 01, 2020 11:57 am
Dave Keenan wrote:
Sat Aug 01, 2020 8:47 am
The universe only needs one log base
So I could lock down any parameter which is a logarithmic base to 2 (or e, but I may as well make them all 2 I guess), as long as I allow its coexistence with its variant as a coefficient. Currently the four which can take these forms are a, j, k, and the total weight on a submetric (like how we'd sometimes weight text{copfr} by c). I don't currently allow this, but there's no particular reason not to (it just seemed really unlikely that a winning metric would call for more than one of a base, an exponent, or a coefficient in the same position). The only problem with this is, as I wondered aloud w/r/t a, is how it forces some metrics which are possible with n chunks to be expressed with n+1 chunks. So while I agree with this sacrifice being in the spirit of the game w/r/t a, I don't think it's the right move for the other three.
Dave Keenan wrote:
Sat Aug 01, 2020 9:56 am
Grr.
Hopefully that's a playful growl....
But yes, again, I'm very sorry! You might be less grumpy if you knew how similar gpf and sopf look in my code (which of course is my fault too, so maybe you're still just as grumpy). But yes, I do apologize for wasting your time.
And that I wouldn't want to artificially constrain my code to be capable of only logarithms of base 2.
That is no constraint at all, as per my previous post.
Well sure, by that definition of "constraint", in terms of the ultimate human meaning of the thing. But I meant I still wanted to be able to ask questions about a metric with in the form with a base not equal to 2 if it was convenient or historical.
We could convert log(4/3) to lb by multiplying j and k. But then k will be non-1, and thus we'll have an extra chunk.
No we won't, because we will no longer have the log base as a parameter/chunk. We just trade a for k. I don't understand why you even gave k as a parameter, when it was forced to 1.
I'm having as much trouble discerning what you mean here as you apparently had discerning what I meant...

Whether a is 4/3 or 2, it's still a chunk to me.

I wouldn't count k as a chunk if it = 1.

Does that resolve any confusion?

I would not characterize "convert log(4/3) to lb by multiplying j and k" as involving "trade a for k". I would characterize it as involving "newly require k".
We could adjust j and k, maintaining the proportion, until k is 1 again, but then the total value from the sopfr will be different,
What sopfr? I thought this metric only had two soapfrs and gpf.
I didn't do that consciously (otherwise I would have more carefully spelled it out exactly), but I was definitely referring to the soapfr for short. Sorry that was confusing. I guess I've become used to thinking of soapfr as a type of sopfr, as that's how it is in my code.

And again, I think of it as one soapfar, not two, and will probably not easily be able to break that conception.
and we'd need a weight on it (or the other term, the gpf) and thus we'd still have the extra chunk.
What is "it" here? AFAIK you already have weights on all the components that are not gpf.
The soapfr. This is clearly another consequence of me thinking of metrics with things in a form like soapfr(n) + k*soapfr(d) as one thing, not two. In my world, the part of a submetric which applies to n is modified by j, the part of a submetric which applies to d is modified by k, and then the submetric as a whole is modified by a parameter I call "weight".
Given that we've previously standardised on j = 1 in the past, where j is the coefficient of some soapfar of the numerator, then it would make more sense to me to have k and l as the parameters, and name the metric "kl", pronounced kayel.
I agree with preferring k to j when equivalent ways of stating the same metric are available. A change I made to the code this afternoon I think will get the variants with k to be preferred.
$$\text{kl}(n,d) = \sum_{p=5}^{p_{max}}\big(\operatorname{lb}{p}×(n_p + k × d_p)\big) + l × \operatorname{gpf}(n_p × d_p), \text{where }n>=d$$
$$k=0.722866218, l=0.319583198 \text{ gives } SoS=0.006970591$$

Excel found this solution in about 4 seconds, starting from k=1, l=1. But it ran it for 30 seconds to give reasonable confidence that there were no better solutions.
I confirm that. Nice!
This (and your original formulation of it) looks like 4 chunks to me: lb, gpf, k, l.
No, my original formulation is 3 chunks: log_a, gpf, and j.

You've restated the concern I articulated which I am still seeking your opinion about. By using an a other than 2, I got away without any l, so I got away with expressing it as a 3-chunk rather than 4-chunk. Of the two techniques I proposed for changing a from 4/3 to 2 which involve increasing the chunk count by 1, you chose the second one: adding a weight to one of the two submetrics (either lb/soapfr or gpf); in particular you added it to gpf. The other technique I proposed was using both k and j.

### Re: developing a notational comma popularity metric

Posted: Sat Aug 01, 2020 6:06 pm
cmloegcmluin wrote:
Sat Aug 01, 2020 11:57 am
So I could lock down any parameter which is a logarithmic base to 2 (or e, but I may as well make them all 2 I guess), as long as I allow its coexistence with its variant as a coefficient.
Agreed.
Currently the four which can take these forms are a, j, k, and the total weight on a submetric (like how we'd sometimes weight text{copfr} by c). I don't currently allow this, but there's no particular reason not to (it just seemed really unlikely that a winning metric would call for more than one of a base, an exponent, or a coefficient in the same position).
You don't currently allow what? A total weight on a submetric? Or locking to 2, any parameter that is a log base?
The only problem with this is, as I wondered aloud w/r/t a, is how it forces some metrics which are possible with n chunks to be expressed with n+1 chunks. So while I agree with this sacrifice being in the spirit of the game w/r/t a, I don't think it's the right move for the other three.
I won't bother arguing about j, k, c etc as log bases, since they have not yet appeared in a low-SoS or low-chunk metric, but in the case of a I hope to convince you that it is not a sacrifice at all, because it does not change the chunk count.
Hopefully that's a playful growl....
Sure. I'm just practising at being a curmudgeon, cmloegcmluin. I figure I'm entitled, now my beard is whitened.
Well sure, by that definition of "constraint", in terms of the ultimate human meaning of the thing. But I meant I still wanted to be able to ask questions about a metric with in the form with a base not equal to 2 if it was convenient or historical.
I don't understand how, in this application, a base other than 2 might be historical or more convenient. But in any case, as you said, we can change bases after the fact. One of the reasons to standardise the base was so our parameter values would agree, i.e. to facilitate checking each other's results without having to rescale.
I wouldn't count k as a chunk if it = 1.
I'm glad we at least agree on that.
Whether a is 4/3 or 2, it's still a chunk to me.
Why isn't setting a=2 analogous to setting k = 1, and therefore not counting as a chunk? When it was 4/3, that was the value of a variable, the optimal value found by your search.
I would not characterize "convert log(4/3) to lb by multiplying j and k" as involving "trade a for k". I would characterize it as involving "newly require k".
I'm not talking about changing a from 4/3 to 2. I'm talking about eliminating a entirely, by changing from a two-argument log function to a single-argument \text{lb} function. You are treating a as a variable, so log_ap is a function of two arguments, 'a' and 'p'. In Excel it is even written as log(p,a). I'm using k×\text{lb}(p) where \text{lb} is a function of a single argument, p and so overall, k×\text{lb}(p) is also a function of two arguments. They both contribute two chunks, one of which is a model parameter, the other of which is some kind of log function.
What sopfr? I thought this metric only had two soapfrs and gpf.
I didn't do that consciously (otherwise I would have more carefully spelled it out exactly), but I was definitely referring to the soapfr for short. Sorry that was confusing. I guess I've become used to thinking of soapfr as a type of sopfr, as that's how it is in my code.
OK. I think the opposite. There's only one sopfr function. I think of sopfr as one kind of soapfr, namely the special case where ap = p.
And again, I think of it as one soapfar, not two, and will probably not easily be able to break that conception.
I can live with that, provided it doesn't make you count chunks where there aren't any, as you initially did with wyb.
This is clearly another consequence of me thinking of metrics with things in a form like soapfr(n) + k*soapfr(d) as one thing, not two. In my world, the part of a submetric which applies to n is modified by j, the part of a submetric which applies to d is modified by k, and then the submetric as a whole is modified by a parameter I call "weight".
Thanks for explaining. Of course "weight" is completely redundant when j and k are coefficients (i.e. multipliers), but I understand it might not be redundant when j and k have other roles.
I agree with preferring k to j when equivalent ways of stating the same metric are available. A change I made to the code this afternoon I think will get the variants with k to be preferred.
Cool.
This (and your original formulation of it) looks like 4 chunks to me: lb, gpf, k, l.
No, my original formulation is 3 chunks: log_a, gpf, and j.
I'm sorry, but I don't understand how you can claim that log_a is one chunk. To me, every parameter is a chunk and every function other than + - × ÷ is a chunk. log_a is not simply a function. It is a function and one of its arguments, which happens to be a model parameter. Perhaps you are being fooled by its visual similarity to log_2 which is simply a function, as is made clear by writing it as \text{lb}.

### Re: developing a notational comma popularity metric

Posted: Sat Aug 01, 2020 6:54 pm
Here's the table updated with "kl".

MetricLowest SoSSoS Chunk
namefound z=-1with z=1count
sopfr0.01420608719845.01
k0.00949124318757.52
j0.00910036518637.52
wyk0.00746044317077.55
wb0.00734536116520.53
cwyk0.00730019516890.57
kl0.00697059117464.54
wyks0.00640663914125.56
hyg0.00637271317867.55
wyb0.00605764915638.54
xwyks0.0055389214309.57
cwyks0.00405952213440.58

kl seems to be in competition with wb for the 4-chunk crown. kl has slightly lower SoS, but wb does better at predicting the higher-prime popularities when trained with a weighting to lower primes.

wb looks simpler to me, than kl, in either the kl or laj formulation, because it doesn't have the gpf term.

$$\text{wb}(n,d) = \sum_{p=5}^{p_{max}} \big((\operatorname{lb}{p}+w){n_p} + (\operatorname{lb}p+b){d_p})\big), \text{where }n>=d$$
$$w = -1.645808649, b = -2.043765116, \text{ gives } SoS=0.007345361, SoS(1)=16520.5$$

### Re: developing a notational comma popularity metric

Posted: Sun Aug 02, 2020 2:23 am
Dave Keenan wrote:
Sat Aug 01, 2020 6:06 pm
Currently the four which can take these forms are a, j, k, and the total weight on a submetric (like how we'd sometimes weight text{copfr} by c). I don't currently allow this, but there's no particular reason not to (it just seemed really unlikely that a winning metric would call for more than one of a base, an exponent, or a coefficient in the same position).
You don't currently allow what? A total weight on a submetric? Or locking to 2, any parameter that is a log base?
Sorry for the pronoun abuse. Neither of your two guesses are correct. What I said "I don't currently allow this", the "this" was referring to "allow [a parameter's] coexistence with its variant as a coefficient."
curmudgeon, cmloegcmluin
This juxtaposition is suitable, since apparently the etymology of curmudgeon is unknown! Maybe in 500 years people will call each other cmloegcmluins and there'll be no evidence why
But I meant I still wanted to be able to ask questions about a metric with in the form with a base not equal to 2 if it was convenient or historical.
I don't understand how, in this application, a base other than 2 might be historical or more convenient.
Well, I do keep one file in the repo which is a historical record of every metric of interest we've proposed, and some of them have non-2 bases, and I'm trying to stay backwards compatible with those. And I've got valuable example metrics all over the tests which involve non-2 bases. That's all I mean. It's for my convenience.

Or if someone popped up on the thread with a 0.0039 SoS metric which involved a non-2 base and I wanted to check it immediately without undoing a bunch of changes to the code. That's all I mean: I think it's valuable to maintain the code's ability at all to work with non-2 bases. I think if I could just articulate it well there'd be no argument.

And as I mentioned before, I have now limited the solver to running only with bases equal to 2.
Whether a is 4/3 or 2, it's still a chunk to me.
Why isn't setting a=2 analogous to setting k = 1, and therefore not counting as a chunk? When it was 4/3, that was the value of a variable, the optimal value found by your search.
I'm afraid I don't follow your thought here at all, but maybe that's just a negative result of this inline format. Perhaps we'll resolve our confusion inside another one of these inline sections.
I would not characterize "convert log(4/3) to lb by multiplying j and k" as involving "trade a for k". I would characterize it as involving "newly require k".
I'm not talking about changing a from 4/3 to 2. I'm talking about eliminating a entirely, by changing from a two-argument log function to a single-argument \text{lb} function. You are treating a as a variable, so log_ap is a function of two arguments, 'a' and 'p'. In Excel it is even written as log(p,a). I'm using k×\text{lb}(p) where \text{lb} is a function of a single argument, p and so overall, k×\text{lb}(p) is also a function of two arguments. They both contribute two chunks, one of which is a model parameter, the other of which is some kind of log function.
I agree that log_a(p) is a 2-argument function of a and p and that text{lb}(p) is a 1-argument function of p. I also agree that they are both the same number of chunks of complexity to explain, because under the hood text{lb} is of course log_2(p). Where we disagree is on how many chunks each of these functions represents. You think two, I think one.

My case for one chunk is this: while applying a as a logarithmic base takes a bit more ink than applying it as a coefficient or exponent, it is no more complex. For example, we don't count y=0.85 as 2 chunks (one chunk for using the exponentiation function, and one for the argument 0.85). To me, an atomic chunk involves at least two subatomic particles: a value, and an application.
I guess I've become used to thinking of soapfr as a type of sopfr, as that's how it is in my code.
OK. I think the opposite. There's only one sopfr function. I think of sopfr as one kind of soapfr, namely the special case where ap = p.
I meant to have said more here but forgot to, and what I meant to say more about was exactly this. I completely agree with your reasoning that sopfr is a kind of soapfr.

It will probably not surprise you that often in object oriented programming, a child type inherits from a parent type and then extends it with new functionality. That's how it is in my code, roughly: soapfr inherits from sopfr and extends it. So in the code it makes more sense to say soapfr is a type of sopfr. Although to us outside the code, it does make more sense to say soapfr is a more general case and sopfr a more specific case, therefore sopfr is a type of soapfar.
And again, I think of it as one soapfar, not two, and will probably not easily be able to break that conception.
I can live with that, provided it doesn't make you count chunks where there aren't any, as you initially did with wyb.
The extra 2 chunks I initially counted for wyb were both due to my mcompfr vs sompfar (or whatever) confusion (per: viewtopic.php?p=2096#p2096 if we need the reference)

There does still remain the issue where I can only realize wyb as two submetrics, one with j=0 and w and the other with k=0 and b, which forces my code to add an extra chunk, but that's my code which is struggling, not me. I know I need to fix this somehow. And I think just this morning a solution finally came to me, so I'll see if I can get to implementing that today.
This is clearly another consequence of me thinking of metrics with things in a form like soapfr(n) + k*soapfr(d) as one thing, not two. In my world, the part of a submetric which applies to n is modified by j, the part of a submetric which applies to d is modified by k, and then the submetric as a whole is modified by a parameter I call "weight".
Thanks for explaining. Of course "weight" is completely redundant when j and k are coefficients (i.e. multipliers), but I understand it might not be redundant when j and k have other roles.
That's exactly correct. It's designed to support balancing submetrics against each other in all possible ways. It may turn out that the best metric involves j and k as coefficients within a submetric to balance n and d but then the submetric as a whole raised to some power to put it in balance with another submetric.
This (and your original formulation of it) looks like 4 chunks to me: lb, gpf, k, l.
No, my original formulation is 3 chunks: log_a, gpf, and j.
I'm sorry, but I don't understand how you can claim that log_a is one chunk. To me, every parameter is a chunk and every function other than + - × ÷ is a chunk. log_a is not simply a function. It is a function and one of its arguments, which happens to be a model parameter. Perhaps you are being fooled by its visual similarity to log_2 which is simply a function, as is made clear by writing it as \text{lb}.
Hopefully this is addressed by my above statement about the difference between arguments and chunks, and we can come to some agreement on this.

It seems you think applying a value as a logarithmic base adds a chunk of complexity where applying it as a coefficient does not. I would be curious, if you do maintain that position, whether applying it as an exponent counts as 1 or 2.

If I must go on, + - × ÷ are all functions too. If I'm understanding your perspective correctly, you would say the function +7 has two chunks of complexity: one for the + itself, and one for the 7; however the function called "increment" which is equivalent to +1 would have only chunk of complexity, since it assumes 1 is the added value. Why couldn't I just name a new function "sevencrement" or whatever which does +7 and then say voila it's only 1 chunk of complexity? Or, to circle back to the lb vs log(4/3) issue, why couldn't I just name a new function l走 which is equivalent to log(4/3) and say that it's now 1 chunk instead of 2? This seems obviously silly to me, so I'm beginning to worry there's just something I profoundly fail to understand about logarithms in particular which circumvents this sort of reasoning.

Another way I might argue this case is from the other direction. Does "7" by itself count as a chunk of complexity, introduced to a metric, but without any application yet? I don't think so. I don't think these values can stand alone.

Now if there ever was a function which had an optional argument, I would say that providing that optional argument would count as an extra chunk. But the base for a logarithmic function is not optional. A logarithm has no existence without a value for its base. Same thing with the addition function and its addend, or the exponentiation function and its exponent.

### Re: developing a notational comma popularity metric

Posted: Sun Aug 02, 2020 7:32 am
It's hiking day today, so I will have to be brief.

I'm sorry that most of your preceding post is wasted in arguing against a straw-man born of your confusion over what it is that I am claiming. But I don't have time to point out every place where you claim I would assign a certain chunk count when in fact I would not.

Apart from the (unwitting) strawmanning, two things stand out:

1. You are wrong when you claim that under the hood lb(p) is log2(p), i.e. log(p,2). Maybe you implemented it that way, but under most "hoods" lb(p) is a primitive. Under the hood, loga(p), i.e. log(p,a), is usually implemented as lb(p)/lb(a). That's almost certainly why your code ran so much faster when you changed from loga(p) to k×lb(p).

2. You are right when you point out that we are being inconsistent in counting log as a chunk, but not counting the unwritten exp in the case of ry. I now believe we should count ry as two chunks, just as I believe we should count loga(p) as two chunks, and k×lb(p) as two chunks.

You seem to have missed where I said I want to count every model parameter as a chunk (as well as counting every function more complex than + - × ÷ as a chunk). But I don't want to count literal constants as chunks, as you seem to think I do.

You can count + - × ÷ as chunks too if you want, but they seem like they should count for less than higher functions like logs, exponentials, powers, and roots.

So I count log2(p) as one chunk, just as I would count log4/3(p) as one chunk were it not for the fact that your 4/3 is not a literal constant, but merely one possible value of the parameter a, which might no longer be 4/3 if we were to train the model on a different set of data, or merely weight the existing data differently.

In determining the complexity of a metric, we must count each parameter as a chunk, independent of what function or operator is applied to it, otherwise my extreme metric that has a separate parameter for each prime, but uses no more operations than wyk or wyb, would be hands-down winner.

That's why I want to count log2(p) as one chunk, but loga(p) as two chunks, even if a happens to come out as 2. And thanks to your recent observation, I also want to count r2 as one chunk, but count ry as two chunks, even if y happens to come out as 2.

Perhaps you thought I was counting k×lb(p) as two chunks because lb(p) = log2(p) and I was counting the "2" as a chunk as well as counting the "log" as a chunk. That is not the case. I count k×lb(p) as two chunks because the "k" is one chunk and the "lb" is one chunk.