Network Analysis - Assignment 1

~~Due February 13, 2013 by 11:59pm~~
Due February 15, 2013 by 11:59pm

You should submit a write-up of your assignment both in pdf form and in hardcopy. The pdf is due by electronic submission before the due date and time. The electronic submission timestamp is what counts for determining whether your assignment was submitted on-time. You may turn the hardcopy printout in up to three days later, but it must be identical to the electronic submission. You will also submit your code electronically. Details on the submission process will be posted here shortly.

To submit your assignment, place a copy of your write-up (pdf preferred) and implementation into a directory called cs380hw1_lastname (substituting in your last name), then tar and gz this directory. Name the file cs380hw1_lastname.tar.gz. Submit it to me by attaching it to an e-mail with the subject of "CS380 Assignment 1 Lastname". (If you've already submitted it using a different subject line or file name, please don't resend it.) Make sure you give me a hard-copy of your write-up for the assignment.

1.) [30 points] Bipartite Graphs

Start by reading over Lada Adamic's Pajek tutorial (it is a bit out-dated, but will help you get started). Pajek is installed on the lab systems. You can also install it on your own computer (http://pajek.imfm.si/doku.php?id=download); it is runs natively on Windows and also on Linux / OS X under wine. Pajek is widely used by sociologists, and provides a large number of network analysis methods. However, it is not extensible.

Download the file actorsandmovies.net and open it in Pajek (File > Network > Read). This is a 2-mode network containing two classes of nodes, actors and movies.

(a) Create a 2-mode partition (Net > 2-Mode > Partition). Now draw the network (Draw > Network+FirstPartition). The two classes of nodes should be colored differently. If labels are not shown, add them (Options>Mark Vertices Using>Labels). Apply your favorite layout algorithm via the Layout menu. Export your layout as a 2D image (Export > 2D > ...) and include it in your write-up, labeled with the layout algorithm you used.

(b) Transform the network into a one-mode network (Net > 2-Mode > 2-Mode to 1-Mode > Rows). Draw the network (Draw > Network). Qualitatively compare the structure of the 2-Mode to the 1-Mode network. Is there a loss of information?

(c) Show the weights on each edge using (Options > Lines > Mark Lines > with Values). What do the values represent?

(d) Compute the unweighted degree of each node (Net > Vector > Centrality > Degree > All). Next draw the network using (Draw > Network+FirstVector). How is the degree represented?

(e) Add the vector value to each vertex (it will be the degree/(max possible degree)) (Options > Mark Vertices Using > Vector Values). Who are the most important actors using this measure?

(f) How does the boundary of the network (i.e. who is included) affect who is found to be most central?

(g) Load the file actorsandmoviesWithGere.net. It contains one extra actor, Richard Gere. Repeat the above procedure. In the 1-mode network of actors, is there a change in who is most central? What does this tell you about biases and boundaries in sample selection?

(h) Remove all edges between actors who have co-starred in fewer than 3 movies (Net > Create New Network > Transform > Remove > Lines with value > Lower than). Which actors comprise the central core of this network?

2.) [10 points] Social Network Analysis

Download and open the file dining‐table_partners.net in Pajek.

(a) Snowball Sampling. Suppose that you are a newcomer to the dinner party, and know only Ella. You'd like to meet everyone, and so your plan is to have Ella introduce you to her closest friends. Once you know those friends, you can ask them to introduce you to their closest friends, and so on. This is a snowball sampling technique.

Highlight the vertices that you will reach using snowball sampling (Net > Create Partition > K‐Neighbors > Output). Who will you not meet using snowball sampling starting with Ella?

(b) Suppose each guest will share her dish (and any dish that is shared with her), with only her 1st and 2nd closest friends. Find groups of guests who can all sample each other's dishes (Net > Partition > Components > Strong, and Draw > Network+FirstPartition).

Which guests will get to sample no other dishes but their own? (Net > Partitions > Degree > Input)

Display the network of strongly connected components, and include a printout in your writeup. (Operations > Network+Partition > Shrink Network > Partition, and Draw > Network+FirstPartition)

3.) [60 points] Graph Search

We commonly use mapping applications (Google / Yahoo maps, GPS units, etc.) to find the shortest route between two locations. In a programming language of your choice, implement breadth-first, depth-first and Dijkstra's graph search algorithms for finding single-source shortest paths in weighted and unweighted graphs. You should implement these algorithms by first implementing the general graph search algorithm, and then substituting in various queuing functions to create the different searches. Each search algorithm should take as input the name of a city on the map, and search for the shortest path to a target city.

In this case, we will be using a map of Romania and searching for the shortest distance from every city to Bucharest. The data is available in the following files:

cities.txt : This file contains a series of lines, one for each city. Each line lists a city's name, followed by the city's straight-line distance to Bucharest. We won't be using the straight-line distance in this assignment.
neighbors.txt : This file contains a series of lines, one for each neighbor/neighbor pair. Each line lists the first city's name, the second city's name, and the distance between the two.

For each of the searches, you should track and report the following:

the number of nodes expanded
the cost of the path found
the running time in milliseconds for the search (i.e., the difference between the computer's clock time in milliseconds before and after each search is run)
the path from the start to the goal as a list of city names

Have your program complete the following table. In the case where a search method fails to find a path, you should indicate the lack of a result (e.g., with "--" , NIL, or "FAILURE") in the corresponding table entry.

City name	#nodes / BFS	# nodes / DFS	# nodes / Dijkstra's	Path cost / BFS	Path cost / DFS	Path cost / Dijkstra's	Runtime / BFS	Runtime / DFS	Runtime / Dijkstra's
Arad
Bucharest
...
Vaslui
Zerind

You should also write a brief summary of your findings, comparing the performance of the algorithms, and discussing particular cases where one of the algorithms performs better than the others. (This summary can be anywhere from a few short paragraphs to one or two pages.) This should be a well-written analysis of your results in the form of a mini-report. Also include a printout of your code (formatted as neatly as possible) in the report and the results table of the three search functions applied to each city in Romania (as the first argument) for a path to Bucharest (the goal city), using the table format above as a guideline. To save paper, I recommend that you print your code with two logical pages per sheet (e.g., using mpage -2ol).

Hints: (you can follow these, or choose to ignore them)

While you are free to use any programming language that you like, I recommend that you strongly consider Java, Python, or Matlab due to the availability of matrix and graph packages for these languages.
You should use some graph data structure to store the graph. You can either create your own, or use an existing graph library. Keep the graph structure simple; all you really need are the cities, their neighbors, and the distance to those neighbors. You should then read the graph into this data structure. My recommendation is to read in the cities first, then read in the neighbors file.
Implement the node and problem data structures as described in class.
Implement the general search algorithm, then create the different searches by substituting in different queuing functions.
It isn't necessary to implement stacks, queues, or priority queues. All you really need to do is simply maintain a list and then have the queuing function insert nodes at the tail (for a queue), at the head (for a stack) or in sorted order (for a priority queue -- don't worry about using a proper priority queue implementation with O(lg n) time, unless you really want to). You are also welcome to use implementations of these data structures provided as external libraries for your chosen language.
Write a main method that loads the graph, and generates the table. The bulk of this function will likely be a loop that runs all of the algorithms on all of the cities and gathers the results in some central place, then outputs the tabular data.

Exercises 1 and 2 were adapted from assignments by Lada Adamic, UMichigan. Exercise 3 was adapted from assignments by Tim Finin and Marie desJardins.