Average Distance between random points

SLD · Dec 17, 2020

Ok, take a disk of radius 10. Drop 10 random points throughout the disk. Just close your eyes and do it. No bias.

What’s the average distance between the 10 points?

And how do you do this?

barbos · Dec 17, 2020

Well, I have an integral:
10./(2*Pi)*integ(sqrt(r1^2 + r2^2 -2*r1*r2*cos(phi)))*d(r1^2)*d(r2^2)*d(phi))

r1=[0,1]
r2=[0.1]
phi=[0,2*Pi]

Number of point is irrelevant, 2 points is enough to have distance between 2 random points. The rest is just geometry and integration.

Bomb#20 · Dec 17, 2020

SLD said:
Ok, take a disk of radius 10. Drop 10 random points throughout the disk. Just close your eyes and do it. No bias.

What’s the average distance between the 10 points?

And how do you do this?

Good question. Assuming "No bias" means equal probability density at every point, the simplest way is to drop about 14 random points throughout the circumscribed square and discard the ones that land outside the circle.

barbos said:
r1=[0,1]
r2=[0.1]
phi=[0,2*Pi]

That's going to bias the distribution, with the center region of the circle more densely packed with points than near the periphery.

repoman · Dec 17, 2020

What if you did it through rectilinear coordinates or polar coordinates, will it be different?

barbos · Dec 17, 2020

Bomb#20 said:
barbos said:

r1=[0,1]
r2=[0.1]
phi=[0,2*Pi]

Click to expand...

That's going to bias the distribution, with the center region of the circle more densely packed with points than near the periphery.

No, it's uniformly distributed in my integral

lpetrich · Dec 18, 2020

I did this problem with Mathematica. For radius = 1, the solution is (128)/(45*Pi)

I did Integrate three times, one for each variable, then I checked what I did with NIntegrate. The numerical value of this quantity is 0.905415

SLD · Dec 18, 2020

Obviously this question is more complicated than I originally envisioned. If it’s easier on a square, that’s fine. My first thought was to simply divide the area by ten, but that’s not right. With a radius of 20, the average distance is greater than the diameter. Nope.

I was doing this to figure out the average distance to our closest technological civilization in the galaxy. Assuming that there are 10 such stars in our galaxy of 185,000 light years diameter, where’s the closest?

In our thread discussing this I came up with 29,059. But I don’t think that’s right.

barbos · Dec 18, 2020

SLD said:
Obviously this question is more complicated than I originally envisioned. If it’s easier on a square, that’s fine. My first thought was to simply divide the area by ten, but that’s not right. With a radius of 20, the average distance is greater than the diameter. Nope.

Oh, you need a number, not an expression

I was doing this to figure out the average distance to our closest technological civilization in the galaxy. Assuming that there are 10 such stars in our galaxy of 185,000 light years diameter, where’s the closest?

In our thread discussing this I came up with 29,059. But I don’t think that’s right.

That's a different problem. Average distance between civilizations is much higher than average distance to the closest one.
30,000 l.y. is about right

lpetrich · Dec 18, 2020

SLD said:
I was doing this to figure out the average distance to our closest technological civilization in the galaxy. Assuming that there are 10 such stars in our galaxy of 185,000 light years diameter, where’s the closest?

A naive way of doing such calculations: find the number of civs per unit disk area, then find the distance as (civs/area)^(-1/2). Likewise for volumes: (civs/volume)^(1/3)

All that a more precise calculation will do is multiply that number by some number near 1, likely in range 1/2 to 2.

This is a "back of the envelope" sort of calculation, and one can go remarkably far with such calculations.

SLD · Dec 18, 2020

barbos said:
SLD said:

Obviously this question is more complicated than I originally envisioned. If it’s easier on a square, that’s fine. My first thought was to simply divide the area by ten, but that’s not right. With a radius of 20, the average distance is greater than the diameter. Nope.

Click to expand...

Oh, you need a number, not an expression

I was doing this to figure out the average distance to our closest technological civilization in the galaxy. Assuming that there are 10 such stars in our galaxy of 185,000 light years diameter, where’s the closest?

In our thread discussing this I came up with 29,059. But I don’t think that’s right.

Click to expand...

That's a different problem. Average distance between civilizations is much higher than average distance to the closest one.
30,000 l.y. is about right

I am interested in the mathematics of it. But you’re right it’s two different problems.

barbos · Dec 18, 2020

lpetrich said:
SLD said:

I was doing this to figure out the average distance to our closest technological civilization in the galaxy. Assuming that there are 10 such stars in our galaxy of 185,000 light years diameter, where’s the closest?

Click to expand...

A naive way of doing such calculations: find the number of civs per unit disk area, then find the distance as (civs/area)^(-1/2). Likewise for volumes: (civs/volume)^(1/3)

All that a more precise calculation will do is multiply that number by some number near 1, likely in range 1/2 to 2.

This is a "back of the envelope" sort of calculation, and one can go remarkably far with such calculations.

It's 2
average distance to the closest star is 2*sqrt(1./denisty)
this is for 2 dimensions.
for 3 it will be a different constant.

barbos · Dec 18, 2020

SLD said:
I am interested in the mathematics of it. But you’re right it’s two different problems.

Do you want exact solution?
or numerical (Monte-Carlo) would suffice?

Swammerdami · Dec 18, 2020

Bomb#20 said:
barbos said:

r1=[0,1]
r2=[0.1]
phi=[0,2*Pi]

Click to expand...

That's going to bias the distribution, with the center region of the circle more densely packed with points than near the periphery.

Look again. Tricky Mr. Barbos wrote "d(r1^2)*d(r2^2)" where most of us would have formulated with the more prosaic "d(r1)*d(r2)."

Bomb#20 said:
SLD said:

Ok, take a disk of radius 10. Drop 10 random points throughout the disk. Just close your eyes and do it. No bias.

What’s the average distance between the 10 points?

And how do you do this?

Click to expand...

Good question. Assuming "No bias" means equal probability density at every point, the simplest way is to drop about 14 random points throughout the circumscribed square and discard the ones that land outside the circle.

You'll only need 12.7 points on average to get 10 points on the disk this way. (Though you might get unlucky and need many more.) This 21.2% wastage of variates is trivial.

For the same task on a 3-ball, that method will waste 47.6%, 5-ball 83.6%, 9-ball 99.4%; for a 15-ball 99.999% of your variates will be discarded.

Volume_of_an_n-ball

For a 59-ball, which may arise in some applications — the dimensions need not be spatial — a whopping 99.9999999999999999999999999999999983% of points in a 59-D hypercube lie outside the inscribed 59-D hyperball. An alternative approach is needed! I do it with 59 Gaussian variates and a 60th random number (p^(1/59)) to normalize distance to origin.

Bomb#20 · Dec 18, 2020

barbos said:
SLD said:

I was doing this to figure out the average distance to our closest technological civilization in the galaxy. Assuming that there are 10 such stars in our galaxy of 185,000 light years diameter, where’s the closest?

In our thread discussing this I came up with 29,059. But I don’t think that’s right.

Click to expand...

That's a different problem. Average distance between civilizations is much higher than average distance to the closest one.
30,000 l.y. is about right

The average distance from a civilization to the closest other one is 29K and change, yes; but that's not the same as the average distance to our closest technological civilization, because we aren't at a random unknown position in the galaxy. We're 27K l.y. from the center. Assuming uniform probability distribution of the other nine, my program gets a 26,250 l.y. average distance to the closest one to us. (Of course we shouldn't assume uniform probability distribution of the other nine; it would be better to plug in the actual density of stars and the actual thickness of the galactic disk as a function of radius. Then again, maybe we should assume everybody is at roughly 27,000 l.y. radius, since further out the stars are scattered thinly, and closer in there's so much dangerous astrophysics going on nearby that the mass extinctions come too often for anyone to develop technological civilization...)

Bomb#20 · Dec 18, 2020

Swammerdami said:
Look again. Tricky Mr. Barbos wrote "d(r1^2)*d(r2^2)" where most of us would have formulated with the more prosaic "d(r1)*d(r2)."

So he did. Good catch.

Bomb#20 said:
Bomb#20 said:

Good question. Assuming "No bias" means equal probability density at every point, the simplest way is to drop about 14 random points throughout the circumscribed square and discard the ones that land outside the circle.

Click to expand...

You'll only need 12.7 points on average to get 10 points on the disk this way.

12.7 is about 14, for sufficiently large values of "about".

lpetrich · Dec 19, 2020

barbos said:
It's 2
average distance to the closest star is 2*sqrt(1./denisty)
this is for 2 dimensions.
for 3 it will be a different constant.

How does that come about?

The average nearest-neighbor distance is the average of (NNDF)*(density)^(-1/dimension) where NNDF is the nearest-neighbor distribution function.

So to find NNDF, decided to do some random-number simulations. Mean and median:

1: 0.50, 0.34
2: 0.50, 0.46
3: 0.55, 0.55
4: 0.61, 0.61

In these simulations, I used periodic boundary conditions, to avoid boundary effects.

lpetrich · Dec 19, 2020

The nearest neighbor probability distribution and Lecture 19 - Nearest neighbor distribution

It does the 2D case, but it should be easy to generalize to different numbers of dimensions. Probability distribution w(r) for distance r with number density N:
\( w(r) \, dr = \left( 1 - \int_0^r w(r') \, dr' \right) N \cdot 2\pi r \, dr \)

Take d/dr
\( \frac{dw(r)}{dr} = - N \cdot 2\pi r w(r) + \left( 1 - \int_0^r w(r') \, dr' \right) N \cdot 2\pi = - N \cdot 2\pi r w(r) + \frac{1}{r} w(r) \)
This is easy to integrate:
\( w(r) = w_0 r e^{-N \pi r^2} \)
Making it add up to 1 over all space gives:
\( w(r) = N\cdot 2\pi r e^{-N\pi r^2} \)

For the 1D case,
\( w(r) \, dr = \left( 1 - \int_0^r w(r') \, dr' \right) N \, dr \)

With d/dr,
\( \frac{dw(r)}{dr} = - w(r) N \)

giving
\( w(r) = N e^{-N r} \)

lpetrich · Dec 19, 2020

Extending to n dimensions, the defining equation becomes
\( w(r) = \left( 1 - \int_0^r w(r') \, dr' \right) N \cdot n V_n r^{n-1} \)

where V

is the angular volume factor: V(1) = 1, V(2) = pi, V(3) = 4*pi/3, etc.
\( V_n = \frac{1}{n} \cdot \frac{2\pi^{n/2}}{\Gamma(n/2)} = \frac{\pi^{n/2}}{\Gamma(n/2+1)} \)

This expression gives V(1) = 2 instead of 1, but that's for extending in both directions instead of in only one direction from zero.

One can solve this equation in the same way as the others, and one finds
\( w(r) = N \cdot n V_n r^{n-1} e^{- N \cdot V_n r^n} \)

I will now try to calculate the mean and median.

The median is easy.
\( \int_r^{\infty} w(r) \, dr = e^{- N \cdot V_n r^n} \)

\( r_{median} = \left( \frac{\ln 2}{N \cdot V_n} \right)^{1/n} \)

The mean is more difficult. One must average r over the distribution:
\( r_{mean} = \int_0^{\infty} N \cdot n V_n r^n e^{- N \cdot V_n r^n} \, dr \)

Change variables from r to u:
\( r = \left( \frac{u}{N \cdot V_n} \right)^{1/n} \)

That gives us
\( r_{mean} = \left( \frac{1}{N \cdot V_n} \right)^{1/n} \int_0^{\infty} u^{1/n} e^{- u} \, du = \left( \frac{1}{N \cdot V_n} \right)^{1/n} \Gamma(1+1/n)\)

lpetrich · Dec 19, 2020

The numbers for my analytic expressions for the mean and median:

1: 1, 0.693
2: 0.5, 0.470
3: 0.554, 0.549
4: 0.608 0.612

I estimate an asymptotic limit for number of dimensions n:
\( \sqrt{ \frac{n}{2 \pi e} }\)

Evaluated for n = 1 to 4: 0.242, 0.342, 0.419, 0.483

barbos · Dec 19, 2020

lpetrich said:
barbos said:

It's 2
average distance to the closest star is 2*sqrt(1./denisty)
this is for 2 dimensions.
for 3 it will be a different constant.

Click to expand...

How does that come about?

Code:

double min_dist(int nstar)
{
  nstar--;
  double mdist=10000;
  while(nstar--){
    double x=drand48()-0.5;
    double y=drand48()-0.5;
    double dist=sqrt(x*x+y*y);
    if(dist<mdist)mdist=dist;
  }
  return mdist;
}
double aver_min_dist(int tries, int nstars)
{
  double dt=(double)tries;
  double dist=0;
  while(tries--)dist+=min_dist(nstars);
  dist/=dt;
  double dd2=pow(1./(double)nstars,1./2);
  
  
  printf("Stars=%d aver_min=%g  sqrt(1./density)*%g\n",nstars,dist,dd2/dist);
  return dist;
}
int main()
{
  aver_min_dist(1000000,1000);
  for(int n=2;n <= 50;n++)aver_min_dist(10000000,n);
}

Average Distance between random points

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Squadron Leader

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor