Finding Paths
The final part of this lab explains how to find paths between any two actors in your Graph created from IMDB data.
Breadth First Search
Since we have an unweighted graph, breadth first search (BFS) is a great choice for finding the shortest paths between any two vertices (i.e., actors). We will call one actor the center (e.g., ”Kevin Bacon”) and the other actor the goal (e.g., ”Christina Ricci”). You should implement BFS in your BaconNumber class in a method:
public List<String> findPath(Vertex center, Vertex goal)
Recall from lecture that the pseudocode for BFS is:
- Create an empty
frontierQueue - Create an empty
exploredSet - Enqueue the
centerin thefrontier - While the frontier is not empty:
- Dequeue
vertexfrom thefrontier - Add
vertextoexplored - Loop over each
neighborof the currentvertex- If
neighborisgoal, return a path fromcentertoneighbor - Else if
neighboris not inexploredandneighboris not infrontier, then enqueueneighborinfrontier
- If
- Return no path exists (e.g., an empty path)
Hint
In Java, the LinkedList<T> class implements the Queue<T> interface (see Lab 5 for a reminder of important methods in the Queue interface).
Also, the HashSet<T> class implements the Set<T> interface, which has methods add(T item) to add an item and remove(T item) to remove an item from the set.
If you use a HashSet, it would be helpful for your Vertex class to implement the
public int hashCode()
and
public boolean equals(Object obj)
methods that are used by HashSet collections to achieve constant time add and remove operations.
Visual Studio Code can create these methods for you if you right click anywhere in the Vertex.java file and select “Source Action” from the right click menu. Then, select “Generate hashCode() and equals()”. Check only the box next to “name: String” and VSC will calculate hashcodes and equality between two Vertex objects based on their name instance variables (i.e., actor names).
Hint
Checking whether a Vertex is already in the frontier Queue is a linear time operation, which will slow down your program for large input files.
To speed this up, we can create a second HashSet called frontierSet in Step 2 that will only store the Vertex objects that are currently on the frontier. Each time you enqueue a Vertex into frontier, also add the same Vertex to frontierSet. Then, each time you dequeue a Vertex from frontier, you can also remove the same Vertex from frontierSet.
Creating Paths
You can create the path from the Vertex center to the Vertex goal however you want in your findPath method that implements the BFS algorithm. How you do so will depend on how you implemented your Vertex class and how you represent edges in the Graph.
Hint
If you have an Edge class to store the label of an edge between two Vertex objects, then one approach is to create a HashMap<Vertex, Edge> at the start of your findPath method, then when you loop over each neighbor of vertex, you can save the edge between vertex and neighbor into this HashMap<Vertex, Edge> with neighbor as the key.
To create the path from goal back to center, you can then follow the edges from goal until you reach the center.
The path itself should be a sequence of Strings starting with goal.name and ending with center.name. The Strings should alternate between actor names and a TV show or movie title that connects that actor with the next. Each transition between name and title or title and name should print an arrow ->.
For example, using the input file imdb_small.txt, one shortest path between goal = “Christina Ricci” and center = “Kevin Bacon”) is:
Christina Ricci -> Man Who Cried, The (2000) -> Cate Blanchett -> Life Aquatic with Steve Zissou, The (2004) -> Bill Murray -> Wild Things (1998) -> Kevin Bacon
README
Because two actors might have acted in multiple TV shows or movies together, there might be multiple equivalent shortest paths between two actors, meaning that the specific actor names and specific TV show or movie titles in the middle of your path could differ from someone else’s, depending on how you each implemented your Graph representations. However, the length of the shortest path between any two actors should be consistent across all representations for a given input file of IMDB data.
Interacting with the Program
Finally, to run your program to be able to find different paths between different actors, your BaconNumber class should have a method:
public static void main(String[] args)
where the first command line argument is the name of the text file to use as input to construct your Graph, and the second command line argument is the name of the goal actor for your search (using ”Kevin Bacon” as your center actor). Your program should also take an optional third command line argument that allows the user to specify a different center actor.
The program should then:
- Read in the IMDB data from the input file and create a
Graphobject (following the process described in Part 2) - Find a shortest path between the desired
centerandgoal - Print the shortest path if one was found, otherwise print
”No path found".
We have provided you with five input files with different sizes of graphs. They are:
flights.txt: a connected graph with 6 vertices and 10 edges, representing the flights from the warmupimdb_small.txt: a connected graph with 161 vertices and 3,396 edgesimdb_movies_medium.txt: a graph with 17,017 vertices and 71,383 edges, created using all movies in IMDB with at least 10,000 ratingsimdb_withtv_medium.txt: a graph with 25,300 vertices and 143,757 edges, created using all movies and TV shows in IMDB with at least 10,000 ratingsimdb_movies_large.txt: a graph with 199,406 vertices and 1,064,902 edges, created using all movies in IMDB with at least 100 ratingsimdb_withtv_large.txt: a graph with 292,411 vertices and 2,511,899 edges, created using all movies and TV shows in IMDB with at least 100 ratings
These are the edge counts of one implementation of a graph. Your edge counts may differ a bit depending on your implementation.
README
Because actor names are often multiple words, we need to enclose them in quotation marks so that Java interprets an entire name as a single command line argument. For example, to produce the path example above, I ran the program as:
java BaconNumber imdb_small.txt "Christina Ricci" "Kevin Bacon"
Experimenting!
Once you have all of the functionality implemented, try experimenting with different parameters for the three command line arguments
Some questions to ponder (but you do not have to submit answers):
- Can you find any actors who are not connected so that there is no path between them?
- What two actors have the longest length of shortest path between them? What is the length of that path?
- How does the stopwatch runtime of your program change as you use smaller or larger input files?
Committing Your Progress
Don’t forget to frequently save your progress to GitHub using the add/commit/push commands with git.