It's not about intuition. I computed it directly from the integral definition fo...

sobellian · on May 15, 2023

I agree that E[r() * r() * ...] = 2^-n but that's not the final word. E[log(r() * r() * ...)] = -n and Pr(r() * r() * ... > 2^-n) is very small. That's the intuition part.

The two distributions are identical, I checked numerically and they have the same summary statistics. Furthermore, r() * n is a uniform distribution from 0 to n so by induction we can prove the two are identical.

sobellian · on May 16, 2023

For anyone who cares and read this far (and if so, I'm honestly surprised) I made a colab that demonstrates all this: https://colab.research.google.com/drive/1xCZ24cLPNOqyJD0kUTO...

To wrap it up:

- X == r() * r() * ... == r(r(...))

- E[X] = 2^-n

- median[X] ~ 1/2 exp(1 - n) ?!

Anyway, I thought it was interesting - especially how the median never quite "catches up" to the value one might naively compute from the CLT. I think there's some subtlety I'm missing there.

maxbond · on May 15, 2023

random() * 5 is equivalent to random(5) (the former being the classic way to implement the latter). So why wouldn't random() * random() be equivalent to random(random())?

SideQuark · on May 15, 2023

Multiplying by a constant commutes with expectation. Multiplying by a random variable does not. It may in some cases, but intuition is not the way to check it.

anderskaseorg · on May 16, 2023

The claim here is not about expectations. random(X) is identically distributed to random() * X, even if X itself is a random variable, because the randomness in random() is independent from the randomness in X. Indeed, the way you would implement random(X), if you only had random() and X available, would be to compute random() * X.

So in particular, random(random()) is identically distributed to random() * random(). (To be clear, this is different from random()^2, as pointed out elsewhere.)

jasomill · on May 16, 2023

As a quick sanity check,

  Mathematica 13.2.1 Kernel for Linux ARM (64-bit)
  Copyright 1988-2023 Wolfram Research, Inc.

  In[1]:= DI = Distributed;

  In[2]:= EV = Expectation;

  In[3]:= SS = Subscript;

  In[4]:= UD = UniformDistribution;

  In[5]:= f[1] :=
    EV[x, DI[x, UD[{0, 1}]]]

  In[6]:= f[n_Integer] :=
    EV[x, DI[x, UD[{0, f[n - 1]}]]]

  In[7]:= Block[{n = 42},
   EV[Product[SS[x, k], {k, 1, n}],
     Map[DI[SS[x, #], UD[{0, 1}]] &,
      Range[1, n]]]
    == Product[EV[x, DI[x, UD[{0, 1}]]], {k, 1, n}]
    == f[n]]

  Out[7]= True

where n = 42 ∈ ℕ is arbitrary.

Alas, Mathematica doesn't appear to support actual nested distributions of the form UD[{1,UD[{1,...}]}], so, e.g.,

  f[2]
  == EV[x_2,
       DI[x_2,
         UD[{0,
           EV[x_1,
             DI[x_1, UD[{0,1}]]]}]]]  
  == EV[x_2,
       DI[x_2,
         UD[{0,1/2}]
  == 1/4

which is trivial.

jasomill · on May 16, 2023

(As is the expectation of a uniform distribution in the first place. From the PDF for the uniform distribution (1/(b - a) for x ∈ [a,b], zero elsewhere) and the definition of EV as the integral of the product of the identity map and the PDF over the reals,

     EV[x, DI[x, UD[{a,b}]]]
  == Integrate[x * Piecewise[{
         {1/(b - a), a <= x <= b},
         {0, x < a || x > b}}],
       {x, -∞, ∞}]
  == Integrate[x * 1/(b - a), {x, a, b}]
  == ((b^2/2) - (a^2/2)) / (b - a)
  == (a + b)/2

so for a = 0,

  == b/2

no matter how many nested integrals it takes to define the real number b.)

SideQuark · on May 16, 2023

Agreed, in this case they agree, but it's completely a numerical coincidence and not general. So people with a rough feeling that they compute rand * 5 should equate to rand(rand())=rand * rand should be admonished to check it extremely carefully. And the result is not true at the level of code.

In practice it's not true, unless extreme care is taken to ensure perfect uniformity, which is rare due to nonuniformity of IEEE floats. I've not seen a codebase in the wild that makes all the required pieces work. It's doable, I've written articles on it, but it's costly to do.

Another error in practice is that the method of taking a rand * 5 is not uniform due to numerical representations. So these are not identical in practice.

This the best way to analyze, which is trivially analyzable, is the nested form.

It's also why I wrote "may" in the previous comment.

anderskaseorg · on May 16, 2023

It is true, both in the ideal mathematical sense, and also in the IEEE sense under the totally obvious implementation that most programmers would write without any special care:

    function rand(endpoint = 1) {
        return endpoint * Math.random();
    }

Here rand(rand()) would evaluate to literally the same number as rand() * rand() given the same state of the underlying generator. Someone who follows this chain of reasoning is perfectly correct to consider it intuitively obvious.

And rand() * rand() is a great form for analysis, since it allows us to take advantage of E[XY] = E[X]E[Y] for independent X, Y to compute the mean very quickly.

Clearly you followed a different chain of reasoning, and that’s fine; maybe you have a much more complicated generator in mind that takes more care to ensure ideal rounding at the limits of floating-point precision, and that’s fine too. But let’s not admonish anyone for making correct statements.

SideQuark · on May 17, 2023

>It is true, both in the ideal mathematical sense, and also in the IEEE sense under the totally obvious implementation .....

That implementation is not uniform under IEEE 754. Here [1] is but one of hundreds of papers on the subject. To quote "Drawing a floating-point number uniformly at random from an interval [a, b) is usually performed by a location-scale transformation of some floating-point number drawn uniformly from [0, 1). Due to the weak properties of floating-point arithmetic, such a transformation cannot ensure respect of the bounds, uniformity or spatial equidistributivity.".

This is trivial to understand. Suppose you truly had a IEEE uniform [0,1) in the sense every possible float representation could occur and an interval of [a,b) (for representable a,b) is represented with probability b-a. Then if you multiply, say, by 0.99, then by the pigenhole principle some of the original values would coalesce, and some would not, so now you are no longer uniform. Some values are more likely to occur. This also happens when you multiply by 5, or 7, or any number, except the ranges, as they stretch, due to finite precision, do not map to the correct length ranges. The new range [0,5) has a different set of representable floats, and the precision of them moves in predictable but non-uniform ways, so the result is no longer uniform. So it is not true that rand() * N is uniform for IEEE 754 numbers.

Another simple way to see it: when you multiply by floats > 1, then some of the low bits get lost for some floats, so you cannot begin to hope the results are uniform. You're truncating/rounding, which loses information, so you by necessity only have an approximation.

And note that very few (if any) popular languages even manage to get [0,1) uniform (more details on 15 languages are in [1]). Their conclusion, Table II, to the question of whether the language rng provides uniformity (and spatial equidistributivity, needed to scale as the above), is no for all 15 languages. And this is because most do what you claim works. Even getting the [0,1) uniform as a start is extremely tricky and nuanced. The usual "uniform random int in [0,2^p-1] then divide by 2^p (or 2^p-1)" fails to be uniform most of the time and (as the paper shows) for pretty much every language implementation out there.

This issue is important for places needing super careful accuracy like physics sims, weather sims, many monte carlo algorithms where you don't want bad numerics screwing with your results, and in all of those places, they use custom, properly written methods to create such numbers. The general idea is that for getting a uniform in [a,b) you need to sample enough bits for the range, then carefully bit construct the floating point value - it cannot be transformed (easily) from smaller or larger ranges.

Table V lists various methods to draw floats in [a,b). Your method is listed (as I claimed above) as not being uniform (in fact, it fails all 5 criteria in the table, the only method of the 7 listed to score so poorly).

Here's [2] a paper showing the common method of using a PRNG then dividing by some number to get "uniform" in [0,1) is not uniform, which is well known [2]. So most implementations don't even start with uniform. This is a well known problem in random number generation. And the rand(rand()) == rand() * rand() relies on uniform distributions.

So please stop claiming things until you have checked them carefully. Fuzzy feelings about what you guess is true is no substitute for actually doing the work to check, and doubling down when someone points you on the right path is bizarre to me.

[1] https://hal.science/hal-03282794v4/document

[2] https://hal.science/hal-02427338/document\\*

anderskaseorg · on May 17, 2023

I’m fully aware of how IEEE 754 works, and that there is—to quote my own comment—“a much more complicated generator that takes more care to ensure ideal rounding at the limits of floating-point precision”. If you’re looking for somebody clueless to argue at, you’ll need to go find someone else.

But these details aren’t relevant to this conversation:

• We’re not simulating the weather, we’re drawing dots in a circle—and anyone who cares about the dots being shifted from true uniformity by tiny fractions of the radius of an electron is going to have objections to any conceivable computerized representation of the numbers involved.

• It’s quite reasonable to implicitly treat this discussion as a question within the domain of true mathematical reals, and quite unreasonable to chastise someone as incorrect and careless for doing so.

• It’s still false that “rand(rand()) == rand() * rand() relies on uniform distributions”. This identity between distributions holds with the naïve multiplicative implementation of rand() and any underlying random generator, even a blatantly skewed one, in either the mathematical or IEEE reals, exactly. (And with a more precise IEEE generator, it will hold in as close of an approximate sense as one can expect any mathematical identity to hold when translated to IEEE, which from my perspective is just fine in this context. My only real interest in even acknowledging IEEE here is to support the point that commenters above were correct to consider the identity intuitively obvious, even when you try to throw in that well-actually.)

SideQuark · on May 17, 2023

> We’re not simulating the weather, we’re drawing dots in a circle

Ah, so we don't care about mathematical rigor, yet we're discussing a proof of the pure mathematical behavior?

> It’s quite reasonable to implicitly treat this discussion as a question within the domain of true mathematical reals,

So we do care about mathematical rigor?

> It’s still false that “rand(rand()) == rand() * rand() relies on uniform distributions”.

Consider a distribution A(r) uniform on [0,r) except it never returns 1/4. This is a perfectly valid distribution, extremely close to being the same (in the mathematical sense) as the uniform distribution itself.

A(A(1)) will never be 1/4. A(1) * A(1) may be 1/4. Thus the claim needing uniform (or a proof for whatever distribution you want) requires careful checking. And if you don't like this distribution, you can fiddle with ever larger sets (in cardinality, then lebesgue measure zero) where you tweak things, and you'll soon discover that without uniform, you cannot make the simplification. And I demonstrated that the actual PRNGs are not uniform, so much care is needed to analyze. (They're not even continuous domain or range)

Yes, if you define the rng circularly, then you can make the assumption, but then you cannot make the claim of 1/2 per term, since they are not uniform. If you define the rng more carefully, then they often don't commute (maybe never?).

For example, defining the rng in another reasonable (yet still naive) fashion to keep more bits:

    float rnd(float max)
        i = int_rand(N) // 0 to N-1, try weird N like N = 12345678
        f = i*max
        return f/N

this does not satisfy rnd(rnd()) == rnd()*rnd() in general, as you can check easily in code.

Thus, to make the claim you can swap them requires for the mathematical part true uniformity (my claim way up above) and for the algorithm sense it requires knowing details about the underlying algorithm. Simply having a rnd(max) blackbox algorithm is not enough.

anderskaseorg · on May 17, 2023

Of course we care about mathematical rigor.

> Yes, if you define the rng circularly, then you can make the assumption,

We’re agreed then.

> but then you cannot make the claim of 1/2 per term, since they are not uniform.

As mathematicians, we’re allowed to make multi-step arguments. The first step rand(rand()) = rand() * rand() holds for any distribution that scales multiplicatively with the argument, as we’ve agreed. The second step E[rand() * rand()] = E[rand()] * E[rand()] holds for any two independent distributions with finite expectation. The third step E[rand()] = 1/2 holds for this particular distribution. An ideal mathematical rand() satisfies all three of these properties, so this argument that E[rand(rand())] = 1/4 is valid in the mathematical reals. (And it’s approximately valid in the IEEE reals, which is all one typically asks of the IEEE reals.)