Import census data
One of the best strategy to begin the journey with geospatial data evaluation is by making practice with census data, which provides an image of all people and households within the countries of the world on the granular level.
On this tutorial, we’re going to use a dataset that gives the variety of cars or vans in the UK and comes from the UK Data Service. The link to the dataset is here.
I’ll start with a dataset that doesn’t contain geographic information:
Each row of the dataset corresponds to a selected output area, which is the bottom geographical level at which census is provided within the UK. There are three features: the geocode, the country and the variety of cars or vans which might be owned by a number of members of a household.
If we would really like to visualise the map immediately, we wouldn’t find a way because we don’t have the mandatory geographical information. We’d like an additional step before showing the potentiality of GeoPandas.
Add geometry to census data
To visualise our census data, we want so as to add a column that stores the geographical information. The method for adding geographical information, for instance adding latitude and longitude for every city, is known as geocoding.
On this case, it’s not only a pair of coordinates, but there are different pairs of coordinates which might be connected and closed, forming the boundaries of the output areas. We’d like to export the Shapefile from this link. It provides the boundary for every output area.
Once the dataset is imported, we are able to merge these two tables using their common field, geo_code:
After assessing the dimension of the dataframe didn’t vary after the left join, we want to ascertain if there are null values in the brand new column:
df.geometry.isnull().sum()
# 0
Luckily there aren’t any null values and we are able to convert our dataframe right into a Geodataframe using the GeoDataFrame class, where we arrange the geometry column as geometry of our geodataframe:
Now, geographical and non-geographical information are combined into a singular table. All of the geographical information is contained in a single field, called geometry. Like in a standard dataframe, we are able to print the data of this geodataframe:
From the output, we are able to see that our geodataframe is an instance of the geopandas.GeoDataFrame
object and the geometry is encoded using the geometry type. To have a greater understanding, we may also display the kind of the geometry column in the primary row:
type(gdf.geometry[0])# shapely.geometry.polygon.Polygon
It’s essential to know that there are three common classes within the geometric object: Points, Lines and Polygons. In our case, we’re coping with Polygons, which make sense since they’re the boundaries of the output areas. Then, the dataset is prepared and we are able to start to construct nice visualizations any longer.
Create a Map with GeoPandas
Now, we’ve got all of the ingredients to visualise the map with GeoPandas. Since considered one of the drawbacks of GeoPandas is the incontrovertible fact that it struggles with huge amounts of knowledge and we’ve got greater than 200 thousand rows, we’ll just deal with the census data of Northern Ireland:
gdf_ni = gdf.query(‘Country==”Northen Ireland”’)
To create a map, you simply have to call the plot()
method on the Geodataframe:
We also would really like to see how the variety of cars/vans is distributed inside Northern Ireland by coloring each output area based on its frequency:
From this plot, we are able to observe that almost all of the areas have around 200 vehicles, aside from small areas marked in green color.
Extract centroid from geometry
Let’s suppose that we wish to vary the geometry and have the coordinates within the centre of the output areas, as a substitute of the polygons. This is feasible by utilizing the gdf.geomtry.centroid
property to compute the centroid of every output area:
gdf_ni[‘centroid’] = gdf.geometry.centroid
gdf_ni.sample(3)
If we display again the data of the dataframe, we are able to notice that each geometry and centroid are encoded as geometry types.
The higher strategy to understand what we actually obtained is to visualise each geometry and centroid columns in a singular map. To plot the centroids, it’s needed to modify the geometry by utilizing set_geometry()
method.
Create more complex maps
There are some advanced features to visualise more details within the map, without creating every other informative column. Before we’ve got shown the variety of cars or vans in each output area, but it surely was more confusing than informative. It might be higher to create a categorical feature based on our numerical column. With GeoPandas, we are able to skip that passage and plot it directly. By specifying the argument scheme=’intervals’
, we’re capable of create classes of cars/vans based on equal intervals.
The map didn’t change loads, but you possibly can see that the legend is far more clear in comparison with the previous version. A greater strategy to visualize the map could be to color it based on levels built using quantiles:
Now, it’s possible to identify more variability throughout the map since each level accommodates a more distributed variety of areas. It’s value noticing that almost all areas belong to the last two levels, corresponding to the very best variety of vehicles. In the primary visualization, 200 vehicles seemed a low number, but there was as a substitute a high variety of outliers with high frequencies that distorted our interpretation.
At this point, we also would really like to have a background map to contextualize higher our results. The preferred strategy to do it’s by utilizing contextily library, which allows to get a background map. This library requires the Web Mercator coordinate reference system (EPSG:3857). For that reason, we want to convert our data to this crs. The code to plot the map stays the identical, aside from a further line so as to add the bottom map from Contextily library:
That’s cool! Now, we’ve got a more skilled and detailed map!
Final thoughts:
This was an introductory tutorial for getting began to make practice with geospatial data using Python. GeoPandas is a Python library specialized in working with vector data. It’s very easy and intuitive to make use of because it has properties and methods just like Pandas, but it surely becomes very slow as soon as the quantity of knowledge grows, particularly when plotting the info.
Along with his bad point, there may be the incontrovertible fact that it is determined by the Fiona library for reading and writing vector data formats. In case Fiona doesn’t support some formats, even GeoPandas is capable of support them. One solution could be by utilizing together GeoPandas to govern data and QGIS to visualise the map. Or trying other Python libraries to visualise the info, like Folium. Do you realize other alternatives? Suggest them within the comments, if you’ve got other ideas.
The code could be found here. I hope you found the article useful. Have a pleasant day!