These are the project tasks affiliated with Harvard CS50. Check it out.
Degrees
Task description: Write a program that determines how many
“degrees of separation” apart two actors are.
# for example, when gets these inputs below, results would be like that
$ python degrees.py large
Loading data...
Data loaded.
Name: Emma Watson
Name: Jennifer Lawrence
3 degrees of separation.
1: Emma Watson and Brendan Gleeson starred in Harry Potter and the Order of the Phoenix
2: Brendan Gleeson and Michael Fassbender starred in Trespass Against Us
3: Michael Fassbender and Jennifer Lawrence starred in X-Men: First Class
Background
This problem is originated from Six Degrees of Kevin Bacon game, saying that anyone in Hollywood film industry can be connected to Kevin Bacon within 6 steps, where each step consists of finding a film that two actors both starred in.
Actually, there maybe exist multiple ways to connect two actors, but in this problem, it just considers the shortest path. In detail, its purpose is to choose a sequence of movies with least length to connect two stars. For example, if the shortest path between Jennifer Lawrence and Tom Hanks is 2: Jennifer Lawrence is connected to Kevin Bacon by both starring in “X-Men: First Class”, and Kevin Bacon is connected to Tom Hanks by both starring in “Apollo 13”.
What is the State and the Action?
Here, each person can be regarded as a state, and a movie can be regarded as action. In other words, a movie could connect one person to another one. And the problem is to find the shortest path, so breadth-first search is the best option.
Understanding
Directory: Large.people.csv
Each person with a unique id
has his name
and birth
. stars.csv
Each row is a pair of a person_id
value and movie_id
value, indicating person with person_id
stars in the movie with movie_id
. movies.csv
It contains movie_id
, titlt
and the year
that the movies was released.
Directory: Small. The same as Large
, but with small database.
degrees.py
Variables:
- names(dict) | name -> person_ids {maybe people have the same name}
- people(dict) | person_id -> name, birth, movies(set)
- movies(dict) | movie_id -> title, year, stars(set)
Functions:
load_data: loads data from
people.csv
,stars.csv
andmovies.csv
main: print the result of one of the shortest path (if it has), or return None when there exists no shortest path.
shortest_path: returns the shortest list of (movies, person_id) pairs
person_id_for_name: returns person_id of the person’s name
neighbor_for_person: returns (movies, person_id) pairs that relates to the given person
Use BFS to find the shortest path
There are two points that we need to pay attention to. First, we cannot visit the node that we have visited, this is because maybe it could cause circle making this loop never end and it absolutely is not the shortest path when we revisit one node. Second, when we backtrack, the order of the result is the opposite, just reverse it.
def shortest_path(source, target):
"""
Returns the shortest list of (movie_id, person_id) pairs
that connect the source to the target.
If no possible path, returns None.
"""
Q = QueueFrontier()
root = Node(state=source, parent=None, action=None)
target_leave = Node(state=None, parent=None, action=None)
Q.add(root)
_find = False
path = list()
unordered_map = dict()
while (not Q.empty()) and (not _find):
top_Node = Q.remove()
if top_Node.state not in unordered_map:
unordered_map[top_Node.state] = 1
# find the top node's neighbors
neighbors = neighbors_for_person(top_Node.state)
for neighbor in neighbors:
movie_id, person_id = neighbor
if person_id not in unordered_map:
child = Node(state=person_id, parent=top_Node, action=movie_id)
if (child.state == target):
target_leave = child
_find = True
break
Q.add(child)
if target_leave.state == None:
return None
else:
while target_leave.parent != None:
# append ("movie_title", "person_name")
path.append((target_leave.action, target_leave.state))
target_leave = target_leave.parent
path.reverse()
return path