On this short piece, I exploit public Wikipedia data, Python programming, and network evaluation to extract and draw up a network of Oscar-winning actors and actresses.
All images were created by the writer.
Wikipedia, as the most important free, crowdsourced online encyclopedia, serves as a tremendously wealthy data source on various public domains. A lot of these domains, from film to politics, involve various layers of networks underneath, expressing different types of social phenomena similar to collaboration. As a consequence of the approaching Academy Awards Ceremony, here I show the instance of Oscar-winning actors and actresses on how we are able to use easy Pythonic methods to show Wiki sites into networks.
First, let’s take a take a look at how, for example, the Wiki list of all Oscar-winning actors is structured:
This subpage nicely shows all of the individuals who have ever received an Oscar and have been granted a Wiki profile (most probably, no actors and actresses were missed by the fans). In this text, I concentrate on acting, which may be present in the next 4 subpages — including principal and supporting actors and actresses:
urls = { 'actor' :'https://en.wikipedia.org/wiki/Category:Best_Actor_Academy_Award_winners',
'actress' : 'https://en.wikipedia.org/wiki/Category:Best_Actress_Academy_Award_winners',
'supporting_actor' : 'https://en.wikipedia.org/wiki/Category:Best_Supporting_Actor_Academy_Award_winners',
'supporting_actress' : 'https://en.wikipedia.org/wiki/Category:Best_Supporting_Actress_Academy_Award_winners'}
Now let’s write a straightforward block of code that checks each of those 4 listings, and using the packages urllib and beautifulsoup, extracts the name of all artists:
from urllib.request import urlopen
import bs4 as bs
import re# Iterate across the 4 categories
people_data = []
for category, url in urls.items():
# Query the name listing page and…