IMDB Data

Application Overview

In this application portion of this lab, we will be implementing a program that can find the shortest paths between two actors (used as a gender-neutral term) based on data from IMDB, where each vertex represents an actor and each edge represents a movie or TV show in which two actors acted.

In class, we have been discussing how graph structures can be used to represent relationships between groups of objects. For this assignment, our program will allow us to play the Kevin Bacon Game. A person’s Bacon Number is computed based on the number of movies of separation between that person and the actor Kevin Bacon. For example, if an actor is Kevin Bacon, then their Bacon Number is 0. If an actor was in a movie with Kevin Bacon, then their Bacon Number is 1. If an actor was not in a movie with Kevin Bacon, but they were in a movie with someone who was, then their Bacon Number would be 2. In short, an actor’s Bacon Number is one greater than the smallest Bacon Number of any of their co-stars.

For fun and some additional background, you can try out the Oracle of Bacon at the University of Virginia.

Data Format

We have provided you with several text files containing data about actors and the movies and TV shows in which they have acted from IMDB. These files are in the format:

<Actor Name>	<Title>

where the actor’s name and the title of a TV show or movie that they acted in is separated by a tab character \t. For example, the line:

Kevin Bacon	Apollo 13 (1995)

indicates that actor ”Kevin Bacon” was in the movie ”Apollo 13 (1995)”.

Hint

To parse this line, refer back to how we parsed the book information in Lab 7. To split a line by the tab character \t, we can use the following code:

String[] split = line.split("\t");
String name = split[0];
String title = split[1];

Application Instructions

In Visual Studio Code, create a new class BaconNumber in a file BaconNumber.java. Once again, you are free to implement this class however you like, but a good suggestion would be to have a private Graph instance variable.

Creating Vertices and Edges

To create the contents a Graph object using the data from an input file, the following process might be useful to implement in a private helper method of your BaconNumber class:

  1. Create a Set<String> to store the unique actor names
  2. Create a Set<String> to store unique TV show and movie titles
  3. Create a Map<String, List<String> that will take an actor’s name as the key and stores a list of TV show and movie titles for that actor
  4. Create a Map<String, List<String> that will take a TV show or movie title as the key and stores a list of actors’ names that were in that TV show or movie
  5. Read the input file line by line, parsing the name and title on each line
  • add the name and title to the Set<String>s from Steps 1 and 2
  • add the title to the list of TV shows and movies for the name key in the Map<String, List<String>> from Step 3
  • add the name to the list of actors for the title key in the Map<String, List<String>> from Step 4
  1. After completing the loop in Step 5, create a Vertex object for each unique actor and add it to the Graph using the addVertex method from Part 1
  2. Create an edge for every two actors that acted in the same TV show or movie.
  • Loop over each actor Vertex and get the list of TV show and movie titles from the Map<String, List<String>> created in Step 3
  • Use an nested loop to loop over the TV show and movie titles in the list from the previous bullet point. Get the list of actor names for the current title from the Map<String, List<String>> created in Step 4.
  • Use a doubly nested loop to loop over the actor names in the list from the previous bullet point. Add an edge between the current vertex.name (from the outer loop) and the current name (from the doubly nested loop), using the the title of the nested loop as the edge label. You can add such an edge to the Graph using the addEdge method from Part 1

Testing Our Progress

If your tests in Part 1 were a success, then your addVertex and addEdge methods should be working well here.

You might want to add a public static void main(String[] args) method to your BaconNumber class and try reading in some test data, such as the imdb_small.txt text file provided in your GitHub repository. This file contains 3,396 edges between 161 vertices/actors to create a connected graph. To verify your implementation so far, you can try reading in this input data, then count whether the number of vertices and edges match the expected counts. You could also use the Debugger to verify that some of your edge representations were created correctly.

Committing Your Progress

Don’t forget to frequently save your progress to GitHub using the add/commit/push commands with git.