Due February 13, 2013 by 11:59pm
Due February 15, 2013 by 11:59pm
You should submit a write-up of your assignment
both in pdf form and in hardcopy. The pdf is due by
electronic submission before the due date and time. The
electronic submission timestamp is what counts for determining
whether your assignment was submitted on-time. You may
turn the hardcopy printout in up to three days later, but it
must be identical to the electronic submission. You will
also submit your code electronically. Details on the
submission process will be posted here shortly.
To submit your assignment, place a copy of
your write-up (pdf preferred) and implementation into a
directory called cs380hw1_lastname (substituting in
your last name), then tar and gz this directory. Name
the file cs380hw1_lastname.tar.gz. Submit it to
me by attaching it to an e-mail with the subject of "CS380
Assignment 1 Lastname". (If you've already submitted it
using a different subject line or file name, please don't
resend it.) Make sure you give me a hard-copy of your
write-up for the assignment.
1.) [30 points] Bipartite Graphs
Start by reading over Lada Adamic's Pajek
tutorial (it is a bit out-dated, but will help you
get started). Pajek is installed on the lab systems.
You can also install it on your own computer (http://pajek.imfm.si/doku.php?id=download);
it is runs natively on Windows and also on Linux / OS X under wine. Pajek is widely
used by sociologists, and provides a large number of network
analysis methods. However, it is not extensible.
Download the file actorsandmovies.net
and open it in Pajek (File > Network > Read). This
is a 2-mode network containing two classes of nodes, actors and
movies.
(a) Create a 2-mode partition (Net > 2-Mode >
Partition). Now draw the network (Draw >
Network+FirstPartition). The two classes of nodes should
be colored differently. If labels are not shown, add them
(Options>Mark Vertices Using>Labels). Apply your favorite
layout algorithm via the Layout menu. Export your layout
as a 2D image (Export > 2D > ...) and include it in your
write-up, labeled with the layout algorithm you used.
(b) Transform the network into a one-mode network (Net >
2-Mode > 2-Mode to 1-Mode > Rows). Draw the network (Draw
> Network). Qualitatively compare the structure of the 2-Mode
to the 1-Mode network. Is there a loss of information?
(c) Show the weights on each edge using (Options > Lines >
Mark Lines > with Values). What do the values represent?
(d) Compute the unweighted degree of each node (Net > Vector
> Centrality > Degree > All). Next draw the network
using (Draw > Network+FirstVector). How is the degree
represented?
(e) Add the vector value to each vertex (it will be the
degree/(max possible degree)) (Options > Mark Vertices Using
> Vector Values). Who are the most important actors
using this measure?
(f) How does the boundary of the network (i.e. who is included)
affect who is found to be most central?
(g) Load the file actorsandmoviesWithGere.net.
It contains one extra actor, Richard Gere. Repeat the above
procedure. In the 1-mode network of actors, is there a change in
who is most central? What does this tell you about biases and
boundaries in sample selection?
(h) Remove all edges between actors who have co-starred in fewer
than 3 movies (Net > Create New Network > Transform >
Remove > Lines with value > Lower than). Which actors
comprise the central core of this network?
(a) Snowball Sampling. Suppose that you are
a newcomer to the dinner party, and know only Ella. You'd
like to meet everyone, and so your plan is to have Ella
introduce you to her closest friends. Once you know those
friends, you can ask them to introduce you to their closest
friends, and so on.
This is a snowball sampling technique.
Highlight the vertices that you will reach using snowball sampling (Net >
Create Partition > K‐Neighbors > Output). Who
will you not meet
using snowball sampling starting with Ella?
(b) Suppose each guest will share her dish (and any dish that is
shared with her), with only her 1st and 2nd closest
friends. Find groups of guests who can all sample each
other's dishes (Net > Partition > Components > Strong,
and Draw > Network+FirstPartition).
Which guests will get to sample no other dishes but their
own? (Net > Partitions > Degree > Input)
Display the network of strongly connected components, and
include a printout in your writeup. (Operations >
Network+Partition > Shrink Network > Partition, and Draw
> Network+FirstPartition)
3.) [60 points] Graph Search
We commonly use mapping applications (Google / Yahoo maps, GPS
units, etc.) to find the shortest route between two
locations. In a programming language of your choice,
implement breadth-first, depth-first and Dijkstra's graph search
algorithms for finding single-source shortest paths in weighted
and unweighted graphs. You should implement these
algorithms by first implementing the general graph search
algorithm, and then substituting in various queuing functions to
create the different searches. Each search algorithm
should take as input the name of a city on the map, and search
for the shortest path to a target city.
In this case, we will be using a map of Romania and searching
for the shortest distance from every city to Bucharest.
The data is available in the following files:
cities.txt: This
file contains a series of lines, one for each city. Each
line lists a city's name, followed by the city's
straight-line distance to Bucharest. We won't be using
the straight-line distance in this assignment.
neighbors.txt :
This file contains a series of lines, one for each
neighbor/neighbor pair. Each line lists the first city's
name, the second city's name, and the distance between the
two.
For each of the searches, you should track and report the
following:
the number of nodes expanded
the cost of the path found
the running time in milliseconds for the search
(i.e., the difference between the computer's clock time in
milliseconds before and after each search is run)
the path from the start to the goal as a list of city
names
Have your program complete the following table. In the
case where a search method fails to find a path, you should
indicate the lack of a result (e.g., with "--" , NIL, or
"FAILURE") in the corresponding table entry.
City name
#nodes / BFS
# nodes / DFS
# nodes / Dijkstra's
Path cost / BFS
Path cost / DFS
Path cost / Dijkstra's
Runtime / BFS
Runtime / DFS
Runtime / Dijkstra's
Arad
Bucharest
...
Vaslui
Zerind
You should also write a brief summary of your findings,
comparing the performance of the algorithms, and discussing
particular cases where one of the algorithms performs better
than the others. (This summary can be anywhere from a few
short paragraphs to one or two pages.) This should be a
well-written analysis of your results in the form of a
mini-report. Also include a printout of your code (formatted
as neatly as possible) in the report and the results table of
the three search functions applied to each city in Romania (as
the first argument) for a path to Bucharest (the goal city),
using the table format above as a guideline. To save paper, I
recommend that you print your code with two logical pages per
sheet (e.g., using mpage -2ol).
Hints: (you can follow these, or choose to
ignore them)
While you are free to use any programming language that
you like, I recommend that you strongly consider Java,
Python, or Matlab due to the availability of matrix and
graph packages for these languages.
You should use some graph data structure to store the
graph. You can either create your own, or use an
existing graph library. Keep the graph structure
simple; all you really need are the cities, their neighbors,
and the distance to those neighbors. You should then
read the graph into this data structure. My
recommendation is to read in the cities first, then read in
the neighbors file.
Implement the node and problem data structures as
described in class.
Implement the general search algorithm, then create the
different searches by substituting in different queuing
functions.
It isn't necessary to implement stacks, queues, or
priority queues. All you really need to do is simply
maintain a list and then have the queuing function insert
nodes at the tail (for a queue), at the head (for a stack)
or in sorted order (for a priority queue -- don't worry
about using a proper priority queue implementation with O(lg
n) time, unless you really want to). You are also
welcome to use implementations of these data structures
provided as external libraries for your chosen language.
Write a main method that loads the graph, and generates
the table. The bulk of this function will likely be a
loop that runs all of the algorithms on all of the cities
and gathers the results in some central place, then outputs
the tabular data.
Exercises 1 and 2 were adapted from assignments by Lada
Adamic, UMichigan. Exercise 3 was adapted from
assignments by Tim Finin and Marie desJardins.