Interesting. I'll need to reconcile this bit with the later stuff about there not being a natural exponent.Dave Keenan wrote: ↑Tue Aug 04, 2020 1:25 pm I don't count the true constants because, if they have any real existence at all, they are simply hidden inside the box. Whether they are actually there inside the box, is an implementation detail that shouldn't matter.
For example, if we look inside an x2 box, we may find there is no number 2 in there at all. It may simply be doing x*x. My resolution of this is to never count the constant, so it doesn't matter if it's really there or not. But your resolution seems to be to claim that it is always possible to view a unary (one-argument) function as a binary (two-argument) function with a constant for one argument. And then I am unclear whether you are counting the function or the constant, since you seem to think they must always go together. But arctan has no constant.
I completely sympathize with your frustration over this particular matter. I've done a bad job with this one, so I apologize. I'm trying to figure out the least confusing way to dig myself out of it and achieve shared clarity with you here...So you count arctan as 1 chunk. How is arctan not a function standing alone (in your terms)? There is no parameter or constant associated with it. It just takes its true-variable input and gives its output. e.g. as part of a soapfar where ap = acrtan(p) or ar = arctan(r). I can't see how you could be counting anything but the function in this case.
Let's start with this: there was an important but unspoken condition on my claim that functions shouldn't stand alone. When I've been tossing around the word "function" the last couple of days, I haven't meant function in the general textbook sense, which would include things like arctan. I've implicitly been using that word to refer to "within our current domain, the things which are helpful to be referred to as functions". That is, I needed a handy way to refer to logarithms, exponents (and roots), and + - × ÷ that I thought would help us speak about the problem space. By my incorrect understanding at that time (read: before I arrived at a deeper level of insight about the nature of the logarithm function) all of those functions had 1 argument which was not optional. And that's all I meant by that they couldn't stand alone.
No, as I've tried to say many times now, I did not think of things in terms of counting the function and number separately. I did not think they could stand alone. I did not break log2 down. I considered function and constant/argument/parameter/base/number/2/whatever-you-want-to-call-that-other-thing to be two subatomic particles of an atomic unit which was worth 1 chunk, because the argument was, by my incorrect understanding then, required. And I kept emphasizing this atomic structure with 2 subatomic parts view because as of yet could not believe that any function we cared about in the domain was anything other than a function with one required argument.This suggests that when you count log2 as 1 chunk you're counting the function, not the constant, as I do. So it would then appear that we agree on counting higher functions and not counting constant inputs to them. But you tell me otherwise below, so I can find no solution in terms of chunks per function, chunks per true constant and chunks per parameter, that satisfies the "simultaneous equations" of your various statements about your chunk counting.
I now see that logarithm is special in the sense of "being a primitive" like arctan. If we were to for some reason throw arctan into the mix here, I would count it as 1 chunk. So, faced with the primitiveness of logarithm, I ultimately I will revise my perspective on atoms and subatomic particles to this: it is the function which I count as a chunk, and for those functions with required arguments, I could be said either to give them the arguments for free, or even to consider them part of the function. That I've been emphasizing the numbers first and then their applications via some function has just been an unfortunate result of my certainty that we were only dealing with functions of one type: those with one required numeric argument.
Per the above, I have functions 1, optional parameters 1, constants 1.I thought you had functions 0, parameters 1, constants 1, where I have higher-functions 1, parameters 1, constants 0
But then you count arctan as 1 chunk. I'm mystified.
It would have been, before I attained the insight that the argument to a logarithm is optional on some crazy deep mathematical level. Now that I understand that, my chunks column completely agrees with yours for this particular set of entries.Is this correct, where 2 is a true constant while `a` and `k` are model parameters?
I agree. But I think it should be clear from above that had I known we were working with some functions that were not types which took a single numeric argument, I would have written: "My chunk count will count parameters as 1 chunk always, where a parameter chunk is comprised of some function and any of its required arguments, leading to log2(x) as 1 chunk and loga(x), a = 2.017 as 1 chunk." Although logarithm is a bad example now, because it was the function which I came to see broke from that function type. So in the end, my bullet will look almost exactly like yours: "My chunk count will count functions in-and-of-themselves, and then count their arguments as a chunk when they are parameters but only when they are optional, leading to log2(x) as 1 chunk and loga(x), a = 2.017 as 2 chunks." Literally the only difference is that you say "not when they are true constants" and I say "only when they are optional".It seems to me, you would have to count arctan(x) as 0 chunks to be consistent with that.My chunk count will count parameters as 1 chunk always, where a parameter chunk is comprised of some value and some function application of that value, leading to log2(x) as 1 chunk and loga(x), a = 2.017 as 1 chunk.
No, it's not new. Consider `k` applied as a coefficient. This is a classic one we've been using since nearly the beginning. It's the function × which has one required argument: what to multiply by. So the function × counts as one chunk. The value multiplied by is either considered to be part of the function (in that sense ×3 is a different function from ×4) or it's considered to be given for free.I haven't noticed you doing that so far. Is this a new thing?This will be our second of two points of disagreement over the implementation of chunks: you will give + - × ÷ for free, while I still count them.
I expect I've gone over this enough in this post already, but it may be helpful to respond in detail anyway.Although I think I know what you mean, I think "application of a value as a function" is an abuse of terminology. But how is arctan the application of a value as a function?Or to be more specific (and preventing confusion with respect to the differences in how we see things via the other point of disagreement addressed above), you would count +1 as 0 chunks, because + is free, and 1 is a constant. While I would count +1 as 1 chunk, because it is an application of a value as a function where that value is 1 and the function is addition. And you would count +w, w=-1.88 as 1 chunk, because + is free, and w is a parameter. While I would +w as 1 chunk, because it is an application of a value as a function where that value is -1.88 and the function is addition.
I agree that "application of a value as a function" is an abuse of terminology, especially as soon as we expand the scope of our discussion beyond the domain wherein we're only dealing with functions with one required numeric argument (which I hadn't realized we already had, by including logarithm).
Arctan is definitely not the application of a value as a function.
I would rewrite my above paragraph this way now: "you would count +1 as 0 chunks, because + is free, and 1 is a constant. While I would count +1 as 1 chunk, because it is a function with one required argument which is therefore in some sense part of the function, and that function is addition and the value of the required argument is 1. And you would count +w, w=-1.88 as 1 chunk, because + is free, and w is a parameter. While I would +w as 1 chunk, because it is a function with one required argument and the function is addition and the required argument in this case is -1.88."
Awesome. I recognize that you've tried to say this thought at least a couple times, but this is the first time it clicked for me. Perhaps I could speak it back to you to confirm my understanding. You attempt to measure in harlows the fitability of a metric in the sense of their ability to change the SoS. I think that's a really cool idea. In fact, I think it's better than attempting to measure the complexity of explaining the metric, which is how I've been approaching it. This is the "model validity" vs "model simplicity" dichotomy again, but now you're getting even more precise about what you mean by validity, upgrading it to the word "fitability".To me, the distinction that matters for fitability is "the application of a function" (e.g. arctan(x) or log2(x)) versus "the application of a parameterised function" (e.g. loga(x) or k×log2(x)). The last two examples are exactly equally fitable, as they will give exactly the same rankings for all ratios, when trained on the same data. i.e. they will give the same metric.
I tried to disclaim that whether or not we ever claimed a parameter as a constant was moot in my previous post, but I might have undermined that by adding this evidence on top of the disclaimer. Sorry!Dave Keenan wrote: ↑Tue Aug 04, 2020 9:05 pm Thanks for finding that. It looks pretty damning. What was I thinking!?
It seems to me — now that you've confirmed my understanding of the logarithm function — that 2 in some sense is not even a constant. It's more like: if you want to use logarithms in some project, you should pick one base and stick with it, and it doesn't matter what that base is (other than that you should pick something conventional, like 2, e, or 10). Whatever your base is more like which logarithmic world you want to live in for now.
So I don't think it matters how we arrived at 2, whether it occurred to us or the code helped us. No need to weasel or anything. Because we could literally pick anything. Maybe that's what you're trying to say about the parameters being "not independent of each other" (I still don't exactly get that point).
I should definitely have immersed myself in it before continuing on this project. Sorry.The boundary notation makes that point even more strongly.
Okay, great, I'm glad I got that part right.Yes. Good observation. There is of course the natural-exponential function, the inverse of the natural-log, sometimes written exp(x) instead of ex. But I understand you are saying, correctly, that there is no such thing as a "natural-power" function, or a "natural-root" function, from which any other root (or power) could be obtained by simply multiplying the result (or the input) by a constant.In either case, I'm sorry for leading you to believe that I grokked logarithms on as deep a level as you do. I guess I had always thought of them as basically the opposite in some essential way as exponentiations. But I cannot find any evidence of a "natural exponent" anywhere. I mean of course you could raise something to the e'th power, but a cursory examination doesn't turn that up as being a common practice or being of much use.
So then how could there be a constant when used in an exponent or root?
I might agree that in the sum-of-squares formula, the exponent 2 is a constant, because it's part of Pythagoras's theorem about hypoteneuses and distances and such. That makes sense to me.
So if we had some math/phil/psych/musical justification for picking a particular exponent in a particular context, then perhaps it could count as a constant?
As far I'm aware, we don't have any such constants going on for us right now.
You say we would need to have put them in there "from the start". I think what bothers me about this idea may be a bit superficial. I would excuse us if we had simply failed to recall some important math/phil/psych/musical concept involving a constant until the code reminded us of it, and then once it had, starting using that constant and not counting against the fitability of the metric. How do you feel about that (it may not even come up, so maybe you don't need to answer).
Right, so what I was flailing about in trying to get across there was the difference between the e in loge and the e in xe. I was trying to capture the same exact idea that you just confirmed for me above: that there is no equivalent of a natural exponent or natural root in the way that there is a natural logarithm. I suppose the distinction is not super important to you: if it's a constant as it is in xe, it counts for 0 harlows. If it's not even there in some sense since `text{ln}` is a "primitive" or whatever you want to call it and the `e` doesn't even exist, then it still counts for 0 harlows.I have no idea why you say there is no such thing as a constant exponent. Don't x2 and x3 and x1/2 have constant exponents? I would only count them as 1 harlow (assuming these constants were put in from the start, not obtained as optimised parameters).Therefore by your "harlows" a logarithm could be one harlow, but if you ever needed to use an exponent or raise something to a power, it'd have to be two harlows, because there is no such thing as a constant exponent (unless there is a difference between a constant and a primitive to you, which I don't understand yet).
-------
Alright, so I think that while we disagree conceptually a bit still, that our chunks and harlows almost completely agree now with respect to the searching my code is doing. The fact is that the only true constant we've really looked at is this base to the logarithm, and as I've mentioned, my code treats that situation how you'd want it to. We don't have any true constants that are being applied with + - × ÷, which would be 0 harlows but 1 chunk. The one place I can think of where we'd disagree is when raising thing to parameters which are exponents, since you'd count that as 2 harlows where I'd count it as 1 chunk. But we can deal with those metrics when they come up.