Home Artificial Intelligence Mapping out the connections of Oscar Winners 1. Wikipedia as a knowledge source 2. Collecting the list of winners

Mapping out the connections of Oscar Winners 1. Wikipedia as a knowledge source 2. Collecting the list of winners

0
Mapping out the connections of Oscar Winners
1. Wikipedia as a knowledge source
2. Collecting the list of winners

Towards Data Science

On this short piece, I exploit public Wikipedia data, Python programming, and network evaluation to extract and draw up a network of Oscar-winning actors and actresses.

All images were created by the writer.

Wikipedia, as the most important free, crowdsourced online encyclopedia, serves as a tremendously wealthy data source on various public domains. A lot of these domains, from film to politics, involve various layers of networks underneath, expressing different types of social phenomena similar to collaboration. As a consequence of the approaching Academy Awards Ceremony, here I show the instance of Oscar-winning actors and actresses on how we are able to use easy Pythonic methods to show Wiki sites into networks.

First, let’s take a take a look at how, for example, the Wiki list of all Oscar-winning actors is structured:

Wiki list of all Oscar-winning actors

This subpage nicely shows all of the individuals who have ever received an Oscar and have been granted a Wiki profile (most probably, no actors and actresses were missed by the fans). In this text, I concentrate on acting, which may be present in the next 4 subpages — including principal and supporting actors and actresses:

urls = { 'actor'         :'https://en.wikipedia.org/wiki/Category:Best_Actor_Academy_Award_winners',
'actress' : 'https://en.wikipedia.org/wiki/Category:Best_Actress_Academy_Award_winners',
'supporting_actor' : 'https://en.wikipedia.org/wiki/Category:Best_Supporting_Actor_Academy_Award_winners',
'supporting_actress' : 'https://en.wikipedia.org/wiki/Category:Best_Supporting_Actress_Academy_Award_winners'}

Now let’s write a straightforward block of code that checks each of those 4 listings, and using the packages urllib and beautifulsoup, extracts the name of all artists:

from urllib.request import urlopen
import bs4 as bs
import re

# Iterate across the 4 categories
people_data = []

for category, url in urls.items():

# Query the name listing page and…

LEAVE A REPLY

Please enter your comment!
Please enter your name here