You should submit a write-up of your assignment
both in pdf form and in hardcopy. The pdf is due by
electronic submission before the due date and time. The
electronic submission timestamp is what counts for determining
whether your assignment was submitted on-time. You may
turn the hardcopy printout in up to three days later, but it
must be identical to the electronic submission. You will
also submit your code electronically.
To submit your assignment, save your assignment as cs380hw4_lastname.pdf
(substituting in your last name), and e-mail it to me with the
subject of "CS380 Assignment 4 Lastname". Make sure you
give me a hard-copy of your write-up for the assignment.
1.) A real-world power law
network (60 pts)
Download the file gnutella.net,
a snapshot of a Gnutella peer-to-peer file-sharing network in
the year 2000.
Compute and plot the degree distribution of the network
(using your favorite plotting software, e.g. Matlab, Excel,
etc.). You can use Pajek to calculate the degree of
each individual node (Network > Create Partition >
Degree > All). Then export the partition as a '.clu' file
by clicking on the save icon to the left of the Partitions
box on the main screen. Now you can import it into Excel or
another program and compute a histogram of it. Try plotting
it on both linear axes and log-log axes.
Compute the average shortest path (Network > Create
Vector > Distribution of Distances).
Construct an Erdos-Renyi random graph with the same number
of vertices and same average degree of each vertex
(Network>Create Random Network>Small World). Check
that the random graph you have obtained has the same number
of vertices and a similar numbers of edges as the gnutella
network. Calculate and export the degrees of all vertices.
Turn in a plot that shows the two-degree distributions
superimposed. In Matlab, you can plot two data sets together
as follows: plot(x1,y1,'r-',x2,y2,'b:'). This will
plot y1 vs. x1 as a red solid line, and y2 vs. x2 as a blue
dotted line.
Comment on the difference in degree distribution between
the gnutella network and the equivalent Erdos-Renyi random
graph? How does this explain the differences you observed in
average shortest path in a previous lab (or you can
recalculate the average shortest paths now)?
2.) Hierarchical clustering (40 pts)
Use Pajek to hierarchically cluster the Gnutella network.
Visualize the permuted adjacency matrix as a spy plot (see
examples on the course slides)
Visualize the dendrogram
Briefly describe what the hierarchical clustering reveals about
the network.