You may work with one partner on this assignment. Anyone from the class is fine.
Your Web browser will start by taking two or three command line arguments: the first is a start_url, the second is link_depth (the depth you will follow any link from the start_url), and the third is an optional html_ignore file. For example, you may run your program like this:
% java WebBrowser www.brynmawr.edu 5 html_ignore_fileStarting at the initial URL, it will crawl the websites following href links to a depth of 5, yielding a set of vertices for the hyperlink graph and connections between them. The list of vertices will then be used to create all the data structures needed by your part 3 solution, taking the place of the urlListFile command line argument from previous parts. For each url, you will create a URLContent object by parsing its file and create its WordFrequency tree. You should be able to just make a few modifications to your ProcessQueries class to get this to work.
You will use the graph in two ways:
The Graph Window GUI has already been implemented for you, you just need to add a button to your WebBrowser to pop-up the Graph Window.
You have been given a HREFScanner class that will take care of all the ugly parsing of url links. It works by first initializing it with a url, then it will open the url's file, scan its webpage for href tags and return the next valid url link from the scanned page when its getNextToken() method is called.
Change ProcessQueries
---------------------
(*) add a graph data member
(*) add a constructor that takes a start url and link_depth
limit (and optionally an html_ignore_list) as input, and
(1) creates a link graph (following links only link_depth deep from the
start url)
(2) creates URLContent list and cache as before
(3) incorporates linked-to information into determining a URL's priority
when ordering query results
Graph
-----
(*) implement the shortestPath method (This is now an extra-credit option)
you can test your shortestPath method before you have other parts of the
program working. Just create a weighted directed graph in the main
method of TryGraph.java and call your shortestPath method on different
start vertices. Even though all the edges in the link-graph will have a
weight value of one, your implemenation should be more general and should
work correctly on any weighted directed graph.
WebBrowser
---------
(*) add Graphics button that pops up a graphics window
(makes a GraphGui object visible)
If you implement one or more extra credit features, be sure to include a description of the feature and how to test it in the README file.
html_ignore
File, containing tokens that
should be ignored from an html input file.