FYI, I fat-fingered something somehow and logged myself out the first time I wrote this, and not even ChromeCacheView saved me. So I apologize if I'm a bit short. I'm mad at myself for not being more careful and methodical when posting here (I guess I got swept into a frenzy of writing for an hour or whatever and lost track of not having saved anything to my paste buffer or soft-saved with "preview").
Dave Keenan wrote: ↑Sun Aug 02, 2020 7:32 am
1. You are wrong when you claim that under the hood lb(p) is log
2(p), i.e. log(p,2). Maybe you implemented it that way, but under most "hoods" lb(p) is a primitive.
I didn't know an operation called "lb" existed until you shared it in
this post, where you said:
Dave Keenan wrote: ↑Tue Jul 07, 2020 3:23 pm
Don't even call it log
2(). Call it lb() (for log-binary, by analogy with ln() for log-natural). This is ISO standard notation.
I don't think we have redefined lb since then. And I am having trouble understanding this to mean anything other than "under the hood lb(p) is log
2(p), i.e. log(p,2)". Can you explain the distinction? What do you mean by "primitive" in this case?
Under the hood, loga(p), i.e. log(p,a), is usually implemented as lb(p)/lb(a).
I am familiar with this logarithmic identity. In fact, that is how it is implemented in my own code.
That's almost certainly why your code ran so much faster when you changed from loga(p) to k×lb(p).
I have so little understanding of what you're trying to get across that I can't figure out what you're saying.
I would not describe myself as having "changed from log
a(p) to k×lb(p)". I would describe myself as having "locked a down to 2 when used as a base and only when used by the solver".
I can at least blindly respond with confidence in my own terms that the reason my code ran so much faster was that it was no longer checking hundreds or thousands of different possibilities whenever it was asked to use a as a base, but only 1.
2. You are right when you point out that we are being inconsistent in counting log as a chunk, but not counting the unwritten exp in the case of ry.
At least I got one thing right!
I now believe we should count ry as two chunks, just as I believe we should count loga(p) as two chunks,
Whoops! Well, I guess we're going to have to continue the debate. I still think they should be counted as 1 chunk. To say that + - × ÷ are 1 chunk while log and exp are 2 chunks feels wrongfully arbitrary. I don't see a strong argument for that.
Moreover, it would be quite painful to make this work in my code without a rewrite from the ground up as I alluded to earlier, to make it approach the problem with chunks as its primary elements, not submetrics and parameters.
and k×lb(p) as two chunks
Yes, I do agree that's two chunks, because k is 1 chunk and lb is 1 chunk.
You seem to have missed where I said I want to count every model parameter as a chunk (as well as counting every function more complex than + - × ÷ as a chunk).
Ah okay, I do see that you described + - × ÷ as functions. I shouldn't have stated it as if you hadn't.
But I don't want to count literal constants as chunks, as you seem to think I do.
The short paragraph where I say something like that was intended as a sort of reductio ad raspberries w/r/t treating these things as 2 chunks. To me, any chunk should be able to stand alone. If you say something like log
a is 2 chunks, you seem to be saying that log is a chunk and the 2 is a chunk, which I think you are, but if either of those chunks are forced to stand alone (as I think they should be able to) they can't. "log" is implicitly log
10 or log
n usually, but that's not what I mean, anymore than an a floating in an equation usually implies it to be a coefficient. Implications aside, a logarithm without a base doesn't stand alone, and a number (or "literal constant") without an operation doesn't stand alone.
You can count + - × ÷ as chunks too if you want, but they seem like they should count for less than higher-order functions like logs, exponentials, powers, and roots.
Ok, here it is. I take this is at least the beginning of your argument for + - × ÷ as 1 chunk while exp and log are 2 chunks. Because they are "higher order". I don't disagree that they are more complex, in the sense that they're taught later in school, they're harder to understand, they build on each other or in some cases are even hierarchically higher hyperoperations of each other. But by that reasoning why wouldn't × ÷ be more thunks than + -? I don't want to start down the road of weighting chunks by arbitrary definitions of mathematical complexity. Soon enough we'll have some things weighing 1.5 chunks. That is not something I want to try to bite off here. Sorry for mixing metaphors.
The conception of chunks I've been working with has been more in terms of how complex it is to state/describe/explain these metrics, in more of a natural language sense, like how many clauses would be required to speak aloud a metric. How hard is it to understand what it is? It isn't any easier or harder to understand the fact that we're using a as a logarithmic base than it is if we used it as a coefficient or an exponent.
If you disagree with the above statement, we might be a bit of an impasse.
For another example, this is why I suggested earlier that if all submetrics were of the same type, even if there were three or four copies of it totaled up, it would still only count as one chunk. Because we could explain it to people starting with the clause, "It's three different sofpr's totaled." I'm not counting functions or arguments or mathematical challenge levels here. I'm counting complexity in terms of articulating the idea of the metric.
So I count log2(p) as one chunk, just as I would count log4/3(p) as one chunk were it not for the fact that your 4/3 is not a literal constant, but merely one possible value of the parameter `a`, which might no longer be 4/3 if we were to train the model on a different set of data, or merely weight the existing data differently.
You've used this word "train" a couple times lately but I have no idea what you're talking about. I'm not training anything on my end. I am training a neural net on a separate project I recently started with a friend who is a professional musician but actually the project doesn't even have anything to do with music. That aside...
I'm so completely confused by what you're trying to say here...
If you want to get philosophical, we do use these parameters such as "k" in sometimes different ways. Sometimes we use it to represent an "unresolved" variable, like one in what my code calls a "scope", where it ranges maybe somewhere from 0 to 2 and we're going to try a bunch of different values in for it and see which one is best. And other times we use it to represent a "resolved" variable, like when we spell out these metrics with the fancy new double-dollar sign bbCodes; the k has definitely been found to be something ideal-ish, but we just write the formula with k so in one place we can see the overall shape and structure of the metric, and in other place on the next line we can see what the actual final values for the k and other parameters are. Right?
If you imagine we soon find and settle on a final metric and share it out with the wider world, it might include the parameter k, but it will have been resolved to some value, like 0.84 or something. In no case can I imagine or understand what it would mean to share out to the world a metric which still has an unresolved variable in it, like an `a` which could be 4/3 or it could be anything else. How would anyone use that? And why would that be any different than a metric in which `a` could be 2 or it could be anything else?
Sorry if that seems a bit desperate but I really have no idea what your point is.
In determining the complexity of a metric, we must count each parameter as a chunk, independent of what function or operator is applied to it, otherwise my extreme metric that has a separate parameter for each prime, but uses no more operations than wyk or wyb, would be hands-down winner.
I agree and hope I haven't somehow given the impression that I think otherwise.
That's why I want to count log2(p) as one chunk, but loga(p) as two chunks, even if `a` happens to come out as 2. And thanks to your recent observation, I also want to count r2 as one chunk, but count ry as two chunks, even if `y` happens to come out as 2.
Okay, let me just get this unambiguously clear. Changing from:
$$\text{f}(n,d) = \sum_{p=5}^{p_{max}} \big((\operatorname{log_a}{p}){n_p d_p}\big)$$
$$a = 2, \text{ gives } SoS=..., SoS(1)=...$$
to:
$$\text{f}(n,d) = \sum_{p=5}^{p_{max}} \big((\operatorname{log_2}{p}){n_p d_p}\big)$$
$$\text{ gives } SoS=..., SoS(1)=...$$
is a reduction of 1 chunk? Just because the 2 is inlined? I can't agree with that. It's the same thing. In the first way of writing it, the use of a in the formula with the value of a on the second line is just an act of convenience, to aid our ability to identify patterns between the metrics. Maybe when we share the metric out, no one will ever know we used letters like a, y, w, b, c, etc. because they'll only ever be exposed to their final resolved values.
Perhaps you thought I was counting k×lb(p) as two chunks because lb(p) = log2(p) and I was counting the "2" as a chunk as well as counting the "log" as a chunk. That is not the case. I count k×lb(p) as two chunks because the "k" is one chunk and the "lb" is one chunk.
No, I agree with your breakdown of these two chunks, per the above. I would never have counted k as free. I did understand that you thought lb was 1 chunk while `log_2` was 2 chunks, but I didn't understand why you would think that.
I just realised we're arguing over a metric for comparing metrics. And these metrics are for comparing ratios for notational commas that are so small that almost no-one will ever use them for notation.
We really have to wind this up soon.
I am ultimately in this to help humanity make music, and I agree this particular path we've gone down the past couple months have not been the most direct or efficient path to that.
Nonetheless — as exasperated as I may get here sometimes — it is a genuine pleasure to learn and exercise my mind on these problems, and I'm proud of what we're accomplishing here.