Professional Documents
Culture Documents
Math Models
Spring 2000
Kim Dressel
Angie Heimkes
Eric Larson
Kyle Pinion
Jason Rebhahn
Stanley Milgram, a social psychologist at Harvard University, first studied the idea that
people are connected through indirect networks. In 1967, Milgram performed his first
real-world study of the Small World Phenomenon by giving letters to various people in
Kansas and Nebraska with the goal of eventually getting the letter to a certain individual
in Boston. The participants were instructed to pass the letter on to whomever they knew
on a first name basis who would be the most likely to know the target recipient in Boston.
Milgram found that it took an average of about five people, or six total transition, to pass
the letter on from the sources in Kansas and Nebraska to the target in Boston. The
findings would also hold when going from a white sender to a black recipient, as it only
requires a couple of people to bridge the divide between the races. With the average of
six total transition, the idea of six degrees of separation came about.
Milgrams finding sparked interest in the phenomenon by other scientist. For a long
time, the Small World Phenomenon was mostly studied by sociologists and
The Watts-Strogatz Model was the most refined model at this time and it provided
compelling evidence that the Small-World Phenomenon is pervasive in a wide range of
networks arising in nature and technology. Some examples of networks that the SmallWorld Model have been applied to include: the grid of power stations in western United
States, neural networks of elegan worms, and the Six Degrees of Kevin Bacon which
states that every actor and actress, past and present, can be connected to Kevin Bacon
through other actors and actresses that they have performed in a movie with. Other areas
of interest that the Small-World Model has spread to and can hopefully be applied to
include economics, physics, biochemistry, and neurophysiology. Just as a message may
be passed from person to person in the world, a disease may be passed from person to
person. So a better understanding of this using the Small-World Model may help prevent
the spread of disease throughout the world. Applying the Small-World Model to
neurophysiology may also help solve certain brain disorders such at epileptic seizures.
The Small-World Model also applies very well to the World Wide Web. Just as there are
connections between people in the world, there are connections, or links, between web
sites on the World Wide Web. The estimated size of WWW is 800 million documents,
and is continuously and rapidly growing. The Northern Light search engine covers the
largest amount of the WWW with only 38% total coverage. Since the Small-World
Network applies so well to the WWW, it would be great if search engines could make use
of this to create more efficient searches over a larger amount of the web.
Figure 1
Short-range contacts, p, are defined for p > 0 the node u has a directed edge on every
other ode within lattice distance p.
Long-range contacts, q, are defined for q 0 and r 0 a directed edge is made using
independent random trials. As seen in figure 2, p = 1 and q = 2. The value of r is not
directly seen in this digram and will be defined shortly. Also, u would be the current
message holder and v and w would be the two long-range contacts.
Figure 2
The decentralized algorithm A is the method by which a message is transmitted from one
message holder to the next. First, the long-range contact(s) are determined. Second, the
message is transferred the closest available node to the target node. That is, if one of the
long-range contacts is closer than the current message holder, the message is transferred
to that long-range contact. However, if no long-range contacts are closer, then the
message is transferred to one of the short-range contacts.
The inverse rth-power distribution is defined as the ith directed edge from u has endpoint v
with probability proportional to [ D(u , v)] r . To obtain a probability distribution, this is
divided by an appropriate normalizing constant. Finally the inverse rth-power distribution
is
[ D(u , v)] r
. Where the value of r is used as a probablility function. That is, if we
[ D(u , v)] r
u v
let r = 0, it would be similar to saying that the probability that you know somebody one
hundred miles away is equally likely as knowing someone that is ten thousand miles
away. As the value of r increases, the likelyhood that you know someone one hundred
miles away is greater than the likelyhood that you know someone ten thousand miles
away. For our model we have select a value of r = 2.
Performance in this system is measured by the average number of steps it takes to get
from the souce to the target. Let this be X and also be mathematically defined as the
expectation of X where E ( X ) =
Pr( X i ) .
i =1
For j > 0, phase j is defined as {x : 2 j < D ( x, t ) 2 j +1} . These are essentially bands
where a particular message holder can be located. The further the message is away from
the user, the wider the band is for each particular phase. Similarly, the Ball j is defined as
for j > 0 {x : D ( x, t ) 2 j } . These are all the phases back to the target. Thus the Ball j
only has an outer limit. With both of these terms, once a message enters a particular
phase or ball, it cannot return to an phase or ball that is further removed from the target in
lattice distance.
Now that we have defined X , X j , and phase j, we can go on to explain the proof of the
theorem. The theorem behind the model states that there is a decentralized algorithm A,
and a constant c, independent of n, so that when r = 2 and p = q = 1, the expected delivery
time of A is at most c(log n) 2 .
First of all, we will define the upper and lower bounds for the probability that u, the
message holder, chooses v, as its long-range contact. Keep in mind, long-range contacts
are determined at random and only one is allowed. To find bounds on the probability we
need to know that the probability that u chooses v as its long-range contact is
d (u , v) 2
. This equation is used to find upper and lower bounds. To find the upper
d (u , v) 2
bound we have:
d (u , v) 2
2n2
j =1
are dealing with a finite lattice structure with the long-range contact at a distance of at
most (n - 1) + (n - 1)= 2n - 2 nodes away from the message holder. We get 4j from the
number of long-range contacts for a specific phase. This is the number of nodes on the
outer edge of the "diamond" structure. And j 2 measures the distance from the center of
the diamond structure to the outer edge of the diamond.
2n2
j =1
2 n 2
4
1
dx
4 + ln(2n 2). This equation can be rewritten as 4 ln (6n) which is
x
approximately ln (n). We can omit the constants because we know the rules of multiples.
Next, to find the lower bound, we simply put ln (n) back into our original equation to get:
1
ln(n)
d (u , v) 2
d (u , v) 2
1
. This can be written as
which is the
ln(n)
d (u , v) 2
d (u , v) 2
Mathematical Background
In the next segment, I will be giving you a mathematical background of some of
the terminology that we will be using in some of our proofs. I will be defining a
geometric series, probability, discrete random variable, and logarithms. First of all, a
series is called geometric if each term in the series is obtained from the proceeding one
a + ar + ar 2 + ar 3 + ... =
ar n 1 = a /(1 r )
n =1
Probability is used to mean the chance that a particular event (or set of events) will occur
expressed on a linear scale 0(impossibility) to 1 (certainty). A discrete random variable
assumes each of its values with a certain probability. The assumed probability of the
outcomes must be between 0 and 1 with the sum of 1. An example would be the number
of heads that occur in a given number of coin tosses. Lastly, log n denotes the logarithm
base 2, while ln n denotes the natural logarithm, base e.
Number of Nodes in Bj
As explained before, Bj is the set of all nodes from the target to the initial
message holder that fall within the lattice distance of 2j. Its set will be defined as being
the following
B j = {x : D( x, t ) 2 j +1}
2j
4i
i =1
1 represents the target itself and is shown as being the red dot in the picture. The i
represents the boundary of the diamond. The 4i denote the number of nodes on the
boundary of the diamonds for each phase. For example, in the first phase you would
have 4*1=4 nodes and on the second phase you would have 4*2=8 nodes. This is shown
in the picture above because the higher phases are shown as being substantially thicker.
The above statement can then be rewritten as
# ( B j ) 1 + 2 j (2 j + 1)
This statement can intern be simplified to the following expression
# ( B j ) 1 + 2 j 1 (2 j + 1)
# ( B j ) 1 + 2 2 j 1 + 2 j 1 > 2 2 j 1
Therefore the number of nodes contained in Bj is at least the value of
# ( B j ) > 2 2 j 1
Probability that a Node will be in Bj
The next proof will give the probability that a node will fall into Bj. This probability can
be represented by the following expression
P (enter B j ) (2 2 j 1 ) /(4 ln(6n)2 2 j + 4 )
The value of 22j-1 represents the number of nodes in Bj. That number was obtained from
the last proof. The 4ln(6n) is the probability that v is chosen as a long range contact.
That was also proven earlier in the paper.
number of nodes in Bj back into the distance formula d(u,v)-2. That statement can be
rewritten and simplified as.
P (enter B j ) 1 /(2 7 ln(6n))
P (enter B j ) 1 /(128 ln(6n))
Therefore we can conclude that the probability that a node will enter Bj is at least the
value of (1/(128ln(6n)).
Proof of Expectation
Xj represents the number of steps spent in phase j. The Expectation can be
expressed by the following expression
EX = E (
X j) =
log n
j=0
EX
We will have to break this equation apart piece by piece in order to obtain our desired
result. The first part that we will be looking at will be the expectation of Xj. It can be
written as.
EX j =
i =1
Pr[X j i ]
Pr[Xj>=i] denotes the probability that the message spends at least I steps in phase j. This
statement can again be broken down by use of the law of total probability as.
This will be 0 because we are already assuming that j>0. Pr [Xj>0] will also be less than
or equal to 1. Or expression can be simplified to the following.
(c(1 (1 / ln(n))))
i 1
i =1
This value is a geometric series. We know how to find a value for a series like this by
using the definition that was given before of (a/(1-r)). Once we substitute all of the given
values back into our original equation we get..
EX j c(ln(n)) 2
References
1) L. Adamic, "The Small World Web" manuscript available at
http://www.parx.xerox.com/istl/groups/iea/www/smallworld.html
2) Sandra Blakeslee. "Mathematics Prove That it is a Small World"
3) Dr. Steve Deckleman. His Mathematical Knowledge
4) Jon Kleinberg. "The Small World Phenomenon: An Algorithmic Perspective".
5) Stanley Milgram. "The Small World Problem" Psychology Today 1, 61 (1967)
6) Beth Salnier. "Small World"
7) R. Albert, H. Jeong, A. Bababarsi. "Diameter of the World-Wide-Web" Nature 401,
130 (1999)