developing a notational comma popularity metric

User avatar
cmloegcmluin
Site Admin
Posts: 1700
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer (he/him/his)
Contact:

Re: developing a notational comma popularity metric

Post by cmloegcmluin »

I think "under the hood" must mean something different to you. To me, in this context, it means "as implemented in library code or hardware".
Sorry I missed this bit last night in my haste before going to bed (also sorry for letting my haste show in the lack of finesse in the presentation on my previous reply... I just really wanted to wake up to a new message from you, rather than delay us another half-day on this).

Yes, I think for me, I had been using the term "under the hood" only to mean "under the icing". :)

I also think implementations in library code or hardware is irrelevant to chunk count. As it turns out, I think anyway, we both agree that icing is irrelevant to chunk count too. I think what we disagreed over was the parameter vs. constant point.
Dave Keenan wrote: Mon Aug 03, 2020 9:39 pm I'm sorry, but I really don't want to spend any more time on this, or soon we'll be trying to come up with a metric for comparing the function-complexity metrics that we're using to compare chunk-counting metrics for comparing ratio popularity metrics. :lol:

That has to ground out somewhere. The justification for any of these chunk-counting schemes is all pretty vague and arbitrary. I'm happy to admit that they are only intended to be quick and dirty. How about we just have two chunk-count columns. "DK chunk count" and "DB chunk count". :)
I understand. That's fine with me. I mean, in the end, I think the plan is to choose one of these metrics, balancing a number of factors including SoS, chunk count, how it looks, etc. which is going to be a pretty subjective decision. So it's probably not that big of a deal if we can't come to an agreement on a single definition of chunk count. Both chunk counts can be presented.
1. To me, for the purpose of crudely measuring the complexity of a model, the functions log2(x), √x, x2, 2x (where the "2"s are true constants) are boxes with one input and one output, as are x! (factorial), sin(x), cos(x) and tan(x), along with their inverses and their hyperbolic cousins (not that I see any application of them here). For this (model complexity) purpose, it makes no difference to me, that the former can be drawn as a box with two inputs (with a "2" feeding into one input) while the latter cannot.
I don't feel like you owe me anything, but I do appreciate that you did a little bit of extra explanation. Unfortunately, we may talking past each other on this particular front. I have no disagreement with the above paragraph. But that's the problem; it fails to address any issue which I thought we were in disagreement about, which means you may not have understood my explanations or questions. I too may have reached my point of exhaustion for explanation and questioning.

For you and me, all eight of those functions would be 1 chunk, and always had been. What I thought our disagreement was over was counting parameters and true constants. That is, log2(x) and loga(x), a = 2.017, where you thought the latter would be 2 chunks while I still considered it to be 1 chunk.

I don't want to drag it out any further, though. I think we understand the nature of the disagreement here, which is enough:
  • Your chunk count will count functions in-and-of-themselves, and then count their arguments as a chunk when they are parameters but not when they are true constants, leading to log2(x) as 1 chunk and loga(x), a = 2.017 as 2 chunks.
  • My chunk count will count parameters as 1 chunk always, where a parameter chunk is comprised of some value and some function application of that value, leading to log2(x) as 1 chunk and loga(x), a = 2.017 as 1 chunk.
I might have come around to your point of view on this, but you did not assuage my concerns about what laws of math I'd be violating by cheating in the end and claiming all of the parameters which had resolved to constants as having been constants in the first place. I tried to bring this point up earlier here, but I didn't respond to your response to it because it didn't make any sense or at least seemed to completely miss my point. What I was trying to say was, how I remember it, and I think if you go back and re-read this entire forum topic from beginning, you'll see that in the beginning we were not using log2 or lb at all. One day one of us ran our solver and it came up with stuff with values close to 2 as being good fits for the data. We both said "ah ha, that looks familiar! I think that value is trying to be a 2, and that would make total sense in this domain, so let's just snap it to 2 henceforth" so in that sense we may already have cheated these laws of metric validity by claiming this 2 as a true constant when honestly we found it by solving for it as a parameter. If I got the story wrong, I'm sorry. Hopefully the story is accurate, and hopefully it helps illustrate for you why I don't understand the reasoning underlying your point of view, or if I do understand it, why I disagree with it. But feel no obligation to correct me. We really should move on.
2. It would be perfectly reasonable to treat + and - as 1 chunk, × and ÷ as 2 chunks and power, exponential, root and log as 3 chunks, or some similar scheme. I just don't think such fine resolution is warranted. You might think of me as having taken such a scheme, divided those chunk numbers by 5 and rounded them to the nearest integer.
This is a relief to see explicitly. Thank you. I know understand unequivocally that you disagree with me on the issue of bringing mathematical complexity into the decision space at all. I was just guessing at that being your opinion until now.

As I stated before, that's not a road I want to go down, period. So I certainly don't think such fine resolution is warranted either; I'm more of the mind that it's a slippery slope and we shouldn't even start down it. Not even a gesture toward it. You seem to be confident enough to break mathematical functions into 0- and 1- chunk categories.

So your chunk count can factor that it. Mine will not. This will be our second of two points of disagreement over the implementation of chunks: you will give + - × ÷ for free, while I still count them. Or to be more specific (and preventing confusion with respect to the differences in how we see things via the other point of disagreement addressed above), you would count +1 as 0 chunks, because + is free, and 1 is a constant. While I would count +1 as 1 chunk, because it is an application of a value as a function where that value is 1 and the function is addition. And you would count +w, w=-1.88 as 1 chunk, because + is free, and w is a parameter. While I would +w as 1 chunk, because it is an application of a value as a function where that value is -1.88 and the function is addition.
User avatar
cmloegcmluin
Site Admin
Posts: 1700
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer (he/him/his)
Contact:

Re: developing a notational comma popularity metric

Post by cmloegcmluin »

Perhaps we should put more weight (in determining the validity of the metrics) on how they perform on ratios that they haven't seen before, or how they perform with z=1, which is almost the same thing, rather than on their chunk counts.
I can't figure out what you mean by this. How would we evaluate their performance on ratios that they haven't seen before?

Do you mean the data points past the first 80 (which we have information for, but have been choosing not to include it)? We don't have information for any other ratios besides the ones out of that Scala stats spreadsheet as far as I know, so I don't know what else you could mean.

Or did you mean we'd try it on the tina candidates and judge the metrics based on how good they make the ones we like look? I wouldn't think so, because I thought the goal of this project was essentially the opposite: to invent a metric independently of the tinas and then impose it on them and use its (perhaps occasionally surprising) results to help decide between the tinas.
User avatar
cmloegcmluin
Site Admin
Posts: 1700
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer (he/him/his)
Contact:

Re: developing a notational comma popularity metric

Post by cmloegcmluin »

Alright, I've got a variety of updates. You may not have a strong opinion on any of these and there's no obligation to respond. I just wanted to document the large number of calls I've made in the home stretch of the development of my solver.

The only thing I have remaining to finish is a layer where after the initial pass per a chunk count where it finds the best starting point per submetric & parameter combination, it will hand that off to the higher-precision recursive perfecter layer automatically, so I don't have to do that pass myself manually. Once I start running this thing on 4-, 5-, and maybe beyond chunk counts, I'll definitely want that in place.

We're about to start getting finalish results on my end! Woo!

1.

I decided not to implement the feature whereby if all submetrics were the same it would still only count as one chunk.

This would have allowed for metrics such as: sopfr(nd) + soapfr(nd) + sopfar(nd) where maybe the ap is is one chunk, maybe the ar is one chunk, and sopfr is one chunk.

I decided not to implement this feature for a few reasons.

The first reason came when I realized that I had proposed this feature as a potential solution to the problem of my code being unable to recognize wyb as 5 chunks, but I had come up with a different feature that would also solve this problem. The other feature was introducing the parameter "b" which was the same thing as "w" except it only applies to denominators. Between the two features, I felt the new parameter "b" was the more natural solution; it allowed me to express wyb as a single submetric rather than hacking it together with two where one had j=0 and the other k=0. So I no longer needed to support homogenous submetric types amounting to one chunk in order to support wyb as 5 chunks.

The second reason is that I don't think homogenous submetric types amounting to one chunk feels very natural.

The third reason is that I feel like metrics such as the example given above should be able to be achieved more elegantly with the proper weights and/or values of the parameters, rather than with multiple copies of the same submetric.

The fourth reason is that I have to be very careful about what new features I add to keep computation time down.

2.

Per point 1, I added a new parameter "b" to my code. "b" is the same as "w", in the sense that it does the same thing. It's another name for "w" when it has a different value in a different position. Previously, I represented metrics which had more than one value for a w-like parameter by adding additional copies of the same submetric. Now I can represent them within a single submetric. Which solves the problem of representing wyb as a 5-chunk metric. And I think it works nicely because submetrics already support things which vary between the numerator and denominator, so this is just an additional bit of the code which behaves like that.

When I first added "b", I wanted to see what the impact was on runtime of my solver, so I ran it for 2-chunk metrics. It didn't make too bad an impact on runtime, but interestingly, it turned up as the new best 2-chunk metric! If you do sopfr but where for the denominator only you add -1.474 to each prime, you can get an SoS* of 0.008957947. For comparison, the previous best 2-chunk metric we'd found gave SoS of 0.009100971, and that was a sopfr but with j as a power exponent of 1.095.

I found another metric better than this 2-chunk j metric as well. This one used "u" which, like "b", is a denominator-specific variant value of another parameter, but in this case "u" is a variant "x". I can dig up the details on this if you'd like, but I'm about to make the point that we shouldn't in fact accept these metrics (as 2-chunkers).

The reason is this: I think these parameters are strictly more complex than their non-denominator-only counterparts. In order to explain them, you need to explain the concept of the adjustment, and then you need to additionally explain the adjustment being applied only to the denominator.

What I decided to do is only allow b when w is also already provided. In this sense, b is an additional chunk of complexity atop w. Or in other words, the metrics I was getting for chunk count 2 would still exist but actually as chunk count 3, where w happened to be equal to 1.

3.

In an earlier post, I had suggested that while I agreed with locking a down to 2 when used as a base, I did not agree with that decision with respect to j, k, or submetric weight.

What I meant was that while adjusting primes by log base 2 made total psychoacoustic sense to me, adjusting the entire numerator result, denominator result, or submetric result by log base 2 made only a more distant, indirect sense of the same type.

And my concern was especially this: that we'd no longer find metrics at chunk count n that we should be finding because they'd be demoted to chunk count n + 1 because if I locked any of these parameters down to one value then we'd require an additional parameter (costing a net one additional chunk, by my and my code's definition of the situation, since the constant-izing of the base parameter would not eliminate a chunk to counteract the new parameter) to achieve the same result.

I decided on my own to lock all four of these parameters down to the constant 2 when used as bases, against how I originally said I felt about it. It was your "2.017 is more complex than 2" idea that convinced me to do this.

4.

From the beginning, my code did not allow the coexistence of j and k as coefficients on the same submetric, for reasons that I feel should be fairly clear.

It took a little bit of manual checking and experimentation on my end, but I determined that combinations of j and k where either one is an exponent or base instead of a coefficient are meaningful and should not be excluded. I can go into details if you're interested. Perhaps this fact is obvious to you already.

5.

Subtopic #5 here is about the other operations in that hexagon of inverses and reciprocals you made for bases, powers, logarithms, and exponents.

I apologize because I have not yet invested the time and attention to review the boundary algebra stuff on this thread and over email, and that may show here (nor that email a while back where you grappled with questions of terminology in this domain). So my terminology here is not going to be super precise, nor is my result super confident.

My initial understanding was that what I had implemented in the code was a the ability for a parameter to be applied as basic logarithm base (take the base a of p), and as a basic power exponent function (raise p to the a). That is, I had implemented only 2 of the total 6 possibilities. And you said I only needed 4 of them at the end of the day to be comprehensive; 2 of the 6 could be eliminated. So I would need to add 2 more to reach 4. And the other two I needed to add could be described as the ability for a parameter to be applied as a power base (raise a to the p) and as a logarithmic power (take the base p of a).

b/c of @volleo6144 's point here, I realized that I only needed one of those two new functions: the power base. The logarithm power is redundant with the logarithm base because I can achieve a logarithm power with a weight on the submetric as a power of -1. I didn't previously search negative powers, but I changed the solver to do so.

* I wonder if you have an opinion on "an SoS" vs. "a SoS"? It depends on whether you want the article to fit the pronunciation on the letter "s" or what it stands for.
User avatar
Dave Keenan
Site Admin
Posts: 2180
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

Post by Dave Keenan »

Sigh.
cmloegcmluin wrote: Mon Aug 03, 2020 4:52 pm You say we need to punish ourselves for using parameters where we could use a constant; what's to stop us from cheating to evade the punishment? It feels like we could just lie at the end and make up some explanation for the constant the solver spits out and to justify it as if we had locked it down as a constant before running the solver. I don't mean that I desire to be evil here. I just mean that I don't comprehend the essence of the existence of any math/psych/phil/musical police here enforcing this sort of distinction, or perhaps rather than police I should characterize it as the laws themselves; I don't get what intellectual laws inform this sort of constraint. And as long as no one has been able to explain this law to me, I can hardly be expected to follow it if I would in the natural course of my behavior break it. Perhaps there's some extra insight you have on metric invention which you could share to assuage this concern of mine.
I really don't know where to begin with this. I feel like, if I have to explain this to you, you're never going to understand my explanation. Deliberately lying about a parameter being a constant just seems such an obviously bad thing to do, or at least pointless. Besides, how would either of us get away with it, since all the painful details are here in this thread for anyone to read. And one of us should call it out if the other tries to do it, deliberately or not.
So yes, I would see a constant e that I needed to count, had we used ln instead of lb.
Are you aware that there is no constant e involved in the definition of natural log? It is simply the area under the curve y = 1/x to the right of x=1 (with area to the left counting as negative). The constant e is defined as that number whose natural log (the area to the right) is 1.
Here's how I would approach the introduction of a chunk.

"Ah, I want to put something else into my metric!

Let's put in some number... most math things need numbers of some kind!

Ah, but how will I use this number? I can't simply drop a number into my metric, or it won't mean anything! Numbers can't stand alone! I know that when I drop numbers in next to things, math conventions say that means to multiply them, but that's not really what I'm talking about... that'd be an implicit function. Whether or not orthographically I need to set down any extra ink to use this number, I need to apply by way of some function!

Perhaps my function will be "as a coefficient / multipliying"! Or perhaps it will be "as a logarithmic base"! Or perhaps it will be "as a power exponent!"

In any case I need to use a function to use it somehow!"
That approach is completely foreign to me, as is your recent locution "an application of a value as a function".

Here's how I approach it:

I start with the idea that we want to transform the prime-factorisation of a ratio into a number that will rank its popularity relative to other ratios. We already have such a transformation that we call sopfr but we want to improve it. I then visualise the prime factorisation, and the components of sopfr, on a dataflow diagram (like a "flow" in node-red, or the circuit diagram for an analog computer).

Then I think about where I can break into the existing flow to insert some further (sub)transformations. For example I can transform the primes p, or their repeat-counts r, before they feed into the multiplier.

Such a transformation need not involve any number (i.e. any constant). For example I might choose to compress r using an arctan function. I trust that not even you can find a secret hidden constant inside arctan, as you manage to do with natural-log.

I want to estimate, in a quick'n'dirty way, the potential for a metric to overfit the data. I figure that every additional transformation that I insert gives the model greater potential to fit the data. So I count one unit of fitability* for each transformation box I add. *Call the unit a "harlow". ;) You can have the term "chunk" for whatever you are doing.

I would still only count one harlow if instead of arctan my compression function was natural-log, square-root or cube-root.

However if I put a knob on my transformation box, that lets me vary its function continuously from square-root to cube-root, or to any power, (i.e. I parameterise my transformation) then that gives the model even more potential to fit the data. So I count another harlow for the knob.
In many cases we already have a coefficient for the log, and in that case, standardising the base is simply removing a redundant parameter. In the cases where standardising the base requires introducing a new coefficient as a parameter, I don't find that any more complex. There's still one parameter and one log function.
I agree completely with the first statement. The second statement would make sense to me if I accepted that coefficients don't count as chunks. I have not been convinced of that yet.
Sigh. I never said that coefficients don't count as chunks. Whether a coefficient counts as a harlow depends on whether it is a true constant or a parameter.
Standardizing to base e changes nothing for me. k times ln(p) has 2 chunks of complexity - one for the log_e, and one for the k. log_a(p) has 1 chunk of complexity - one for the log_a, whatever that a turns out to be.
I can (just barely) understand you seeing it that way for explainability-chunks, but not fitability-chunks (harlows). They are exactly as fitable as each other. You will get exactly the same minimum SoS.
User avatar
Dave Keenan
Site Admin
Posts: 2180
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

Post by Dave Keenan »

cmloegcmluin wrote: Tue Aug 04, 2020 6:18 am
Perhaps we should put more weight (in determining the validity of the metrics) on how they perform on ratios that they haven't seen before, or how they perform with z=1, which is almost the same thing, rather than on their chunk counts.
I can't figure out what you mean by this. How would we evaluate their performance on ratios that they haven't seen before?

Do you mean the data points past the first 80 (which we have information for, but have been choosing not to include it)? We don't have information for any other ratios besides the ones out of that Scala stats spreadsheet as far as I know, so I don't know what else you could mean.
Yes. That's exactly what I mean.
Or did you mean we'd try it on the tina candidates and judge the metrics based on how good they make the ones we like look? I wouldn't think so, because I thought the goal of this project was essentially the opposite: to invent a metric independently of the tinas and then impose it on them and use its (perhaps occasionally surprising) results to help decide between the tinas.
No. I did not mean that. That would be pointless and ridiculous.
User avatar
Dave Keenan
Site Admin
Posts: 2180
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

Post by Dave Keenan »

cmloegcmluin wrote: Tue Aug 04, 2020 2:08 am What I was trying to say was, how I remember it, and I think if you go back and re-read this entire forum topic from beginning, you'll see that in the beginning we were not using log2 or lb at all. One day one of us ran our solver and it came up with stuff with values close to 2 as being good fits for the data. We both said "ah ha, that looks familiar! I think that value is trying to be a 2, and that would make total sense in this domain, so let's just snap it to 2 henceforth" so in that sense we may already have cheated these laws of metric validity by claiming this 2 as a true constant when honestly we found it by solving for it as a parameter. If I got the story wrong, I'm sorry. Hopefully the story is accurate, and hopefully it helps illustrate for you why I don't understand the reasoning underlying your point of view, or if I do understand it, why I disagree with it. But feel no obligation to correct me. We really should move on.
I'm sorry, but we can't just "move on" from that! You've essentially made a claim of mathematical or intellectual dishonesty. I say you are mistaken. It never happened. Or at least I never pretended a model parameter was a true constant, and nor did I wittingly allow you to do so. Clearly the burden of proof is now on you, to search the thread to find where it occurred and link to it, or to say that you are then satisfied it never occurred. It is impossible for me to prove its non-existence. I could search and not find it and still you could think I didn't look hard enough.
User avatar
cmloegcmluin
Site Admin
Posts: 1700
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer (he/him/his)
Contact:

Re: developing a notational comma popularity metric

Post by cmloegcmluin »

In many cases we already have a coefficient for the log, and in that case, standardising the base is simply removing a redundant parameter. In the cases where standardising the base requires introducing a new coefficient as a parameter, I don't find that any more complex. There's still one parameter and one log function.
I agree completely with the first statement. The second statement would make sense to me if I accepted that coefficients don't count as chunks. I have not been convinced of that yet.
Sigh. I never said that coefficients don't count as chunks. Whether a coefficient counts as a harlow depends on whether it is a true constant or a parameter.
This one's easy: I'm very, very sorry for the confusion. I definitely meant to write "constant" there, not "coefficient".
Dave Keenan wrote: Tue Aug 04, 2020 10:30 am I really don't know where to begin with this. I feel like, if I have to explain this to you, you're never going to understand my explanation. Deliberately lying about a parameter being a constant just seems such an obviously bad thing to do, or at least pointless. Besides, how would either of us get away with it, since all the painful details are here in this thread for anyone to read. And one of us should call it out if the other tries to do it, deliberately or not.
Note: it wasn't until I was in the middle of articulating my response to other stuff you said that I came to some breakthroughs about your thinking. Therefore much of the following stretch of my response is rendered moot by those breakthroughs. However, I felt it was valuable to keep it, to help you understand my thinking.

I think I may have taken a subpar approach to engaging on this issue. Sorry about that.

So... I agree that in a world where there is a meaningful difference between a parameter and a constant, that lying about it would be bad/pointless and that we wouldn't, couldn't, and shouldn't let each other get away with it.

The issue is that I don't yet believe we live in a world where there is a such a meaningful difference.

A couple days ago, that idea was completely alien to me. I didn't even fathom that anyone could possibly think such a thing, and that's why I had such trouble following your statements; I kept seeing what I wanted to see, because I wasn't even capable of imagining such a perspective on the world.

Today, the notion of a difference between a parameter and a constant has been illuminated for me. I understand that such an idea exists. But I still seek to be convinced that this idea is true.

But let's assume you have convinced me. Then, please understand that one of the things I've been trying to do is call us out for lying about a parameter being a constant. We found the base 2 by running the solver, therefore it is a parameter, not a chunk.

I do not understand how you could reassure me that we are not lying about it. And this doubt that you could reassure me about this reinforces my doubt that any difference between a constant and a parameter could or should matter.

That's what I'm trying to say.

And before I could post this, you added another reply, wherein you recognize that I was indeed making this claim of impropriety. I had been putting off validating my suspicions. But you're right that the onus is on me to prove it. So here's what I found:

If you go to page 7 of this thread, you can see that on the I'm throwing out log base a's with a equal to assorted things like 0.62, 1.501, 0.872, 0.517. And then you throw out one with a log base 3. Then you continue on page 8 looking into log base a where a is 3.956349187, 3.018652175, or e. Then, and this is the critical moment I was thinking of: in this post, I come back with an SoS of 0.004250806 I found using an a of 1.994, to which I react: "so basically log2, which is certainly psychologically motivated!!" This was before I started building the automated solver, but it's basically the same thing: I was manually fiddling with knobs stabbing around at different values for parameters, trying to hone in on good local minima, and the best one I found before I realized I really needed to automate this process, took me down a path toward 2. I didn't set out with the realization that it made sense. The machine led me there, and only once I arrived did I realize what it was trying to tell me.
Are you aware that there is no constant e involved in the definition of natural log? It is simply the area under the curve y = 1/x to the right of x=1 (with area to the left counting as negative). The constant e is defined as that number whose natural log (the area to the right) is 1.
I am. I don't know what you want me to do with that knowledge here.

...at least I didn't until I read this:
For example I might choose to compress r using an arctan function. I trust that not even you can find a secret hidden constant inside arctan, as you manage to do with natural-log.
I think this may be coming together into what I needed to hear when I pressed you earlier to explain what you meant by "ln" being a primitive.

Now I am thinking that there is an even deeper layer to this struggle: not only are you saying that when we lock the base of a logarithm to something conventional like e, or 2, we're not even merely changing from a parameter to a constant, we are eliminating the constant altogether.

I have suspected this, and I seem to be having it confirmed now, that on some profound level I don't quite "get" logarithms like you do. I think they may be superbly unfortunately confusing for me in the context of this conversation. It seems like the base of a logarithm is in some sense an optional argument, and that the real core of what logarithms do (which is special to logarithms, and which no other functions can do) is irrespective of what number you put in as the base. Could you please confirm or disconfirm that this is true for me?

In either case, I'm sorry for leading you to believe that I grokked logarithms on as deep a level as you do. I guess I had always thought of them as basically the opposite in some essential way as exponentiations. But I cannot find any evidence of a "natural exponent" anywhere. I mean of course you could raise something to the e'th power, but a cursory examination doesn't turn that up as being a common practice or being of much use. Therefore by your "harlows" a logarithm could be one harlow, but if you ever needed to use an exponent or raise something to a power, it'd have to be two harlows, because there is no such thing as a constant exponent (unless there is a difference between a constant and a primitive to you, which I don't understand yet).

I wish there was some other example of a function with such an optional argument that I could compare them with. arctan doesn't fit the bill since as far as I know you can't modify it with an optional number.

In any case, it seems it might be fair to say that the reason you don't want to count the base 2 while you would count the base 2.017 could be defended by saying that the argument to the function "logarithm" is optional, or said another way, that writing "logarithm" with any base at all is actually the syntactic sugar version, or the for-convenience version anyway, because anything you could achieve with base-changing we can already achieve with functions that existed in math before we invented logarithm.

By locking my code down to a single base, 2, then, I believe I have achieved what you would want in terms of measuring chunks/harlows for logarithms.

I will not, however, add a layer to my code to make it capable of building metrics where exponents count for 2 chunks/harlows and + - × ÷ count for 0. That would involve a total overhaul of my solver. We'll just have to look at what my solver spits out and manually sort the best results out with respect to your definition of harlows. As I described earlier, now that my code spits out all best metrics for a given chunk count, I think that will be a tractable task.
User avatar
Dave Keenan
Site Admin
Posts: 2180
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

Post by Dave Keenan »

cmloegcmluin wrote: Tue Aug 04, 2020 2:08 am I understand. That's fine with me. I mean, in the end, I think the plan is to choose one of these metrics, balancing a number of factors including SoS, chunk count, how it looks, etc. which is going to be a pretty subjective decision.
Agreed.
So it's probably not that big of a deal if we can't come to an agreement on a single definition of chunk count. Both chunk counts can be presented.
Agreed.
1. To me, for the purpose of crudely measuring the complexity of a model, the functions log2(x), √x, x2, 2x (where the "2"s are true constants) are boxes with one input and one output, as are x! (factorial), sin(x), cos(x) and tan(x), along with their inverses and their hyperbolic cousins (not that I see any application of them here). For this (model complexity) purpose, it makes no difference to me, that the former can be drawn as a box with two inputs (with a "2" feeding into one input) while the latter cannot.
... the above paragraph ... fails to address any issue which I thought we were in disagreement about, which means you may not have understood my explanations or questions. I too may have reached my point of exhaustion for explanation and questioning.
I thought we were still in disagreement about whether or not to count true-constants (as opposed to parameters). I say no, you say yes. The relevance of my above paragraph is that I don't count the true constants because, if they have any real existence at all, they are simply hidden inside the box. Whether they are actually there inside the box, is an implementation detail that shouldn't matter.

For example, if we look inside an x2 box, we may find there is no number 2 in there at all. It may simply be doing x*x. My resolution of this is to never count the constant, so it doesn't matter if it's really there or not. But your resolution seems to be to claim that it is always possible to view a unary (one-argument) function as a binary (two-argument) function with a constant for one argument. And then I am unclear whether you are counting the function or the constant, since you seem to think they must always go together. But arctan has no constant.
For you and me, all eight of those functions would be 1 chunk, and always had been.
OK. So you count arctan as 1 chunk. How is arctan not a function standing alone (in your terms)? There is no parameter or constant associated with it. It just takes its true-variable input and gives its output. e.g. as part of a soapfar where ap = acrtan(p) or ar = arctan(r). I can't see how you could be counting anything but the function in this case.

This suggests that when you count log2 as 1 chunk you're counting the function, not the constant, as I do. So it would then appear that we agree on counting higher functions and not counting constant inputs to them. But you tell me otherwise below, so I can find no solution in terms of chunks per function, chunks per true constant and chunks per parameter, that satisfies the "simultaneous equations" of your various statements about your chunk counting.
What I thought our disagreement was over was counting parameters and true constants. That is, log2(x) and loga(x), a = 2.017, where you thought the latter would be 2 chunks while I still considered it to be 1 chunk.
I do consider the latter to be 2 chunks. I thought you had functions 0, parameters 1, constants 1, where I have higher-functions 1, parameters 1, constants 0

But then you count arctan as 1 chunk. I'm mystified.

Is this correct, where 2 is a true constant while `a` and `k` are model parameters?

Function of pChunks (DB)Harlows (DK)
arctan(p)11
ln(p)11
log2(p)11
loga(p)12
k×ln(p)22
I don't want to drag it out any further, though. I think we understand the nature of the disagreement here, which is enough:
  • Your chunk count will count functions in-and-of-themselves, and then count their arguments as a chunk when they are parameters but not when they are true constants, leading to log2(x) as 1 chunk and loga(x), a = 2.017 as 2 chunks.
  • My chunk count will count parameters as 1 chunk always, where a parameter chunk is comprised of some value and some function application of that value, leading to log2(x) as 1 chunk and loga(x), a = 2.017 as 1 chunk.
It seems to me, you would have to count arctan(x) as 0 chunks to be consistent with that.
This will be our second of two points of disagreement over the implementation of chunks: you will give + - × ÷ for free, while I still count them.
I haven't noticed you doing that so far. Is this a new thing?
Or to be more specific (and preventing confusion with respect to the differences in how we see things via the other point of disagreement addressed above), you would count +1 as 0 chunks, because + is free, and 1 is a constant. While I would count +1 as 1 chunk, because it is an application of a value as a function where that value is 1 and the function is addition. And you would count +w, w=-1.88 as 1 chunk, because + is free, and w is a parameter. While I would +w as 1 chunk, because it is an application of a value as a function where that value is -1.88 and the function is addition.
Although I think I know what you mean, I think "application of a value as a function" is an abuse of terminology. But how is arctan the application of a value as a function?

To me, the distinction that matters for fitability is "the application of a function" (e.g. arctan(x) or log2(x)) versus "the application of a parameterised function" (e.g. loga(x) or k×log2(x)). The last two examples are exactly equally fitable, as they will give exactly the same rankings for all ratios, when trained on the same data. i.e. they will give the same metric.
User avatar
Dave Keenan
Site Admin
Posts: 2180
Joined: Tue Sep 01, 2015 2:59 pm
Location: Brisbane, Queensland, Australia
Contact:

Re: developing a notational comma popularity metric

Post by Dave Keenan »

cmloegcmluin wrote: Tue Aug 04, 2020 12:17 pm And before I could post this, you added another reply, wherein you recognize that I was indeed making this claim of impropriety. I had been putting off validating my suspicions. But you're right that the onus is on me to prove it. So here's what I found:

... Then, and this is the critical moment I was thinking of: in this post, I come back with an SoS of 0.004250806 I found using an a of 1.994, to which I react: "so basically log2, which is certainly psychologically motivated!!" This was before I started building the automated solver, but it's basically the same thing: I was manually fiddling with knobs stabbing around at different values for parameters, trying to hone in on good local minima, and the best one I found before I realized I really needed to automate this process, took me down a path toward 2. I didn't set out with the realization that it made sense. The machine led me there, and only once I arrived did I realize what it was trying to tell me.
But it turned out that the machine's choice of 1.994 was completely arbitrary — an accident of your choice of initial values and the order in which you adjusted the parameters. This is because the parameters were not independent of each other. I didn't realise that until 3 pages later, and I didn't do a very good job of explaining it. See:
viewtopic.php?p=1965#p1965
viewtopic.php?p=1982#p1982

However I was surprised to read this, in my response to the post you linked above (bold added):
Dave Keenan wrote: Tue Jun 30, 2020 1:07 pm [Your "kaycw" metric] seems fairly insensitive to the value of alpha (the log base), so we might claim that as a constant rather than a parameter.
Thanks for finding that. It looks pretty damning. What was I thinking!?

I can't honestly say I remember. I humbly apologise for having said it. At the very least it misled you about what was acceptable. But rest assured we never actually claimed a parameter as a constant.

If I'm allowed to weasel ;), I note that I only said "maybe", and that was only because, when I played around with the parameters, I found the metric was "fairly insensitive" to the log base.

As it turned out, the metric was completely insensitive to the log base, because the parameters were not independent of each other. And so it was totally fine to set the log base `a` to any constant value that was convenient. We only needed one of `a` or `k`, not both.
I think this may be coming together into what I needed to hear when I pressed you earlier to explain what you meant by "ln" being a primitive.

Now I am thinking that there is an even deeper layer to this struggle: not only are you saying that when we lock the base of a logarithm to something conventional like e, or 2, we're not even merely changing from a parameter to a constant, we are eliminating the constant altogether.

I have suspected this, and I seem to be having it confirmed now, that on some profound level I don't quite "get" logarithms like you do. I think they may be superbly unfortunately confusing for me in the context of this conversation. It seems like the base of a logarithm is in some sense an optional argument, and that the real core of what logarithms do (which is special to logarithms, and which no other functions can do) is irrespective of what number you put in as the base. Could you please confirm or disconfirm that this is true for me?
Yes, I believe that is true. The boundary notation makes that point even more strongly. It does an enormous amount of work with a kind of log function (and its inverse exponential function) that do not even have a determinate base. All that matters is that the two functions are inverses.

But it is not necessary to understand or accept this to follow my argument for my counting of "harlows". I mistakenly thought that if you already understood about natural logs not depending on the constant e (but rather the other way round) then I could use that as an example of a function that did not require an associated constant or parameter. But now that I have the example of arctan, that doesn't even have an associated constant or parameter, I don't need to use that fact about natural logs.
In either case, I'm sorry for leading you to believe that I grokked logarithms on as deep a level as you do. I guess I had always thought of them as basically the opposite in some essential way as exponentiations. But I cannot find any evidence of a "natural exponent" anywhere. I mean of course you could raise something to the e'th power, but a cursory examination doesn't turn that up as being a common practice or being of much use.
Yes. Good observation. There is of course the natural-exponential function, the inverse of the natural-log, sometimes written exp(x) instead of ex. But I understand you are saying, correctly, that there is no such thing as a "natural-power" function, or a "natural-root" function, from which any other root (or power) could be obtained by simply multiplying the result (or the input) by a constant.
Therefore by your "harlows" a logarithm could be one harlow, but if you ever needed to use an exponent or raise something to a power, it'd have to be two harlows, because there is no such thing as a constant exponent (unless there is a difference between a constant and a primitive to you, which I don't understand yet).
I have no idea why you say there is no such thing as a constant exponent. Don't x2 and x3 and x1/2 have constant exponents? I would only count them as 1 harlow (assuming these constants were put in from the start, not obtained as optimised parameters).
I wish there was some other example of a function with such an optional argument that I could compare them with. arctan doesn't fit the bill since as far as I know you can't modify it with an optional number.
I can't think of any other function with an "optional argument" in that sense. But I also don't see the relevance to harlows.
In any case, it seems it might be fair to say that the reason you don't want to count the base 2 while you would count the base 2.017 could be defended by saying that the argument to the function "logarithm" is optional,
No. The only reason I would not want to count a base of 2 while counting a base of 2.017 is if the 2 was a true constant and the 2.017 was the result of optimising a parameter.
or said another way, that writing "logarithm" with any base at all is actually the syntactic sugar version, or the for-convenience version anyway, because anything you could achieve with base-changing we can already achieve with functions that existed in math before we invented logarithm.
I agree with that part. But it's not required to explain harlows.
By locking my code down to a single base, 2, then, I believe I have achieved what you would want in terms of measuring chunks/harlows for logarithms.
Thanks.
I will not, however, add a layer to my code to make it capable of building metrics where exponents count for 2 chunks/harlows and + - × ÷ count for 0. That would involve a total overhaul of my solver. We'll just have to look at what my solver spits out and manually sort the best results out with respect to your definition of harlows. As I described earlier, now that my code spits out all best metrics for a given chunk count, I think that will be a tractable task.
That's absolutely fine with me. I don't want you to spend any more time rewriting your code. I'm keen to see the results of running it.

Something I meant to say earlier: I too use the expedient, in my spreadsheet, of implementing some true constants by simply locking the value of what is (in other metrics) a parameter. I assume we both understand that is irrelevant to this discussion. For the purpose of this discussion, if it is locked, not optimised, it is a true constant not a parameter.
User avatar
cmloegcmluin
Site Admin
Posts: 1700
Joined: Tue Feb 11, 2020 3:10 pm
Location: San Francisco, California, USA
Real Name: Douglas Blumeyer (he/him/his)
Contact:

Re: developing a notational comma popularity metric

Post by cmloegcmluin »

Dave Keenan wrote: Tue Aug 04, 2020 9:05 pm That's absolutely fine with me. I don't want you to spend any more time rewriting your code. I'm keen to see the results of running it.
I intend to respond to the rest of what you've said soon, but I just wanted to drop a quick note to say that it's running as we speak!

I kicked it off for 4-chunk metrics a little before I went off to bed, came back to check on it 8 hours later, and it's chugging away at about 11.1% done. So we can roughly estimate that it's going to take a total 9×8≈72 hours, or 3 days. So I'll have plenty of time to respond to what you've said :)

The pattern so far how long it takes per chunk count is a bit irregular for whatever reason, but it's looking like it should take somewhere in the ballpark of 1 year to run for 5-chunk metrics. Obviously we can't wait that long, so we'll need to reconfigure it based on what we learn from the best 4-chunk metrics, in order to dramatically reduce the search space.

It's currently taking a "leave basically no stone left unturned approach" where the ranges it searches for each parameter are a appreciably outside the ranges of what we've been finding good results for, just in case something great shows up surprisingly outside those ranges. I mean, as long as I spent this long building such a thing, I feel compelled to leverage it for what it's good at but I'm bad at! But it looks like if my code is going to try to run for 6- or 7- chunks, we are just going to have to impose our guesses about what's reasonable a bit more firmly before running it if we ever want any help from it.

Another thing we might consider doing is eliminating some parameters altogether. In the past week, the parameter count has skyrocketed from 14 to 25, which exponentially slows the thing down. First I added `x` back (it was temporarily disabled only as an attempt to prevent blue threads of death, before I realized that that problem was much deeper, and that I could solve it by just adding a timeout – more on blue threads of death in a minute). Then I realized that weight had just been straight up missing from the list (due to an unfortunate confluence of two implementation errors of mine), so that added 3 (as coefficient, logarithm base, and power exponent). Then I added the power base operation options to 4 different parameters (including weight), which added another 4. Then I added those 3 denominator-specific alternatives to certain parameters (`w`, `x`, and `y`). So that's a total of 1+3+4+3=11 new parameters. That bumped 3-chunk's run back up from 20 minutes to more like an hour.

So I'm thinking that if some of these parameters never appear in a single metric which beats SoPF>3, we could eliminate them from the running.

The third option for getting it to run faster is to finally actually consider the validity of every combination of parameters in terms of whether it poses a blue thread of death risk. There may be very many of them. And if so, then it could make a big impact to spend more like 0 ms on each of them rather than the currently maximum allotted ~10000 ms on each of them. As long as I built the timeout, we may as well keep it (that'll prevent the situation where we miss even a single possibility but that being enough to prevent the code from ever completing).

Well, I guess that wasn't exactly a quick note...
Post Reply