### Network Analysis - Assignment 4

Due May 1, 2013 by 11:59pm

You should submit a write-up of your assignment both in pdf form and in hardcopy.  The pdf is due by electronic submission before the due date and time.  The electronic submission timestamp is what counts for determining whether your assignment was submitted on-time.  You may turn the hardcopy printout in up to three days later, but it must be identical to the electronic submission.  You will also submit your code electronically.

To submit your assignment, save your assignment as cs380hw4_lastname.pdf (substituting in your last name), and e-mail it to me with the subject of "CS380 Assignment 4 Lastname".  Make sure you give me a hard-copy of your write-up for the assignment.

1.)  A real-world power law network (60 pts)

Download the file gnutella.net, a snapshot of a Gnutella peer-to-peer file-sharing network in the year 2000.
• Compute and plot the degree distribution of the network (using your favorite plotting software, e.g. Matlab, Excel, etc.).  You can use Pajek to calculate the degree of each individual node (Network > Create Partition > Degree > All). Then export the partition as a '.clu' file by clicking on the save icon to the left of the Partitions box on the main screen. Now you can import it into Excel or another program and compute a histogram of it. Try plotting it on both linear axes and log-log axes.
• Compute the average shortest path (Network > Create Vector > Distribution of Distances).
• Construct an Erdos-Renyi random graph with the same number of vertices and same average degree of each vertex (Network>Create Random Network>Small World). Check that the random graph you have obtained has the same number of vertices and a similar numbers of edges as the gnutella network. Calculate and export the degrees of all vertices.
• Turn in a plot that shows the two-degree distributions superimposed. In Matlab, you can plot two data sets together as follows:  plot(x1,y1,'r-',x2,y2,'b:'). This will plot y1 vs. x1 as a red solid line, and y2 vs. x2 as a blue dotted line.
• Comment on the difference in degree distribution between the gnutella network and the equivalent Erdos-Renyi random graph? How does this explain the differences you observed in average shortest path in a previous lab (or you can recalculate the average shortest paths now)?

2.)  Hierarchical clustering (40 pts)

Use Pajek to hierarchically cluster the Gnutella network.
• Visualize the permuted adjacency matrix as a spy plot (see examples on the course slides)
• Visualize the dendrogram

Briefly describe what the hierarchical clustering reveals about the network.