Utilizing NetworkX for Graph-Based Country Border Evaluation

Python offers a big selection of libraries that allow us to simply and quickly address problems in various research areas. Geospatial data evaluation and graph theory are two research areas where Python provides a robust set of useful libraries. In this text, we are going to conduct an easy evaluation of world borders, specifically exploring which countries share borders with others. We’ll begin by utilizing information from a GeoJSON file containing polygons for all countries worldwide. The final word goal is to create a graph representing the varied borders using NetworkX and utilize this graph to perform multiple analyses.
GeoJSON files enable the representation of assorted geographical areas and are widely utilized in geographical evaluation and visualizations. The initial stage of our evaluation involves reading the countries.geojson
file and converting it right into a GeoDataFrame
using GeoPandas
. This file has been sourced from the next GitHub repository and comprises polygons representing different countries worldwide.
As shown above, the GeoDataFrame
comprises the next columns:
ADMIN
: Represents the executive name of the geographical area, equivalent to the country or region name.ISO_A3
: Stands for the ISO 3166–1 alpha-3 country code, a three-letter code uniquely identifying countries.ISO_A2
: Denotes the ISO 3166–1 alpha-2 country code, a two-letter code also used for country identification.geometry
: This column comprises the geometrical information that defines the form of the geographical area, represented asMULTIPOLYGON
data.
You possibly can visualize all of the multi polygons that make up the GeoDataFrame
using theplot
method, as demonstrated below.
The multi polygons inside the geometry
column belong to the category shapely.geometry.multipolygon.MultiPolygon
. These objects contain various attributes, certainly one of which is the centroid
attribute. The centroid
attribute provides the geometric center of the MULTIPOLYGON
and returns a POINT
that represents this center.
Subsequently, we are able to use this POINT
to extract the latitude and longitude of every MULTIPOLYGON
and store the leads to two columns inside the GeoDataFrame
. We perform this calculation because we are going to later use these latitude and longitude values to visualise the nodes on the graph based on their real geographic positions.
Now it’s time to proceed with the development of the graph that may represent the borders between different countries worldwide. On this graph, the nodes will represent countries, while the perimeters will indicate the existence of a border between these countries. If there may be a border between two nodes, the graph may have an edge connecting them; otherwise, there shall be no edge.
The function create_country_network
processes the data inside the GeoDataFrame
and constructs a Graph
representing country borders.
Initially, the function iterates through each row of the GeoDataFrame
, where each row corresponds to a special country. Then, it creates a node for the country while adding latitude and longitude as attributes to the node.
Within the event that the geometry is just not valid, it rectifies it using the buffer(0)
method. This method essentially fixes invalid geometries by applying a small buffer operation with a distance of zero. This motion resolves problems equivalent to self-intersections or other geometric irregularities within the multipolygon representation.
After creating the nodes, the following step is to populate the network with the relevant edges. To do that, we iterate through the various countries, and if there may be an intersection between the polygons representing each countries, it implies they share a standard border, and, in consequence, an edge is created between their nodes.
The subsequent step involves visualizing the created network, where nodes represent countries worldwide, and edges signify the presence of borders between them.
The function plot_country_network_on_map
is liable for processing the nodes and edges of the graph G
and displaying them on a map.
The positions of the nodes on the graph are determined by the latitude and longitude coordinates of the countries. Moreover, a map has been placed within the background to supply a clearer context for the created network. This map was generated using the boundary
attribute from the GeoDataFrame
. This attribute provides information in regards to the geometrical boundaries of the represented countries, aiding within the creation of the background map.
It’s essential to notice one detail: within the used GeoJSON file, there are islands which can be considered independent countries, though they administratively belong to a selected country. That is why it’s possible you’ll see quite a few points in maritime areas. Remember that the graph created relies on the data available within the GeoJSON file from which it was generated. If we were to make use of a special file, the resulting graph could be different.
The country border network we’ve created can swiftly assist us in addressing multiple questions. Below, we are going to outline three insights that may easily be derived by processing the data provided by the network. Nonetheless, there are lots of other questions that this network might help us answer.
Insight 1: Examining Borders of a Chosen Nation
On this section, we are going to visually assess the neighbors of a selected country.
The plot_country_borders
function enables quick visualization of the borders of a selected country. This function generates a subgraph of the country provided as input and its neighboring countries. It then proceeds to visualise these countries, making it easy to look at the neighboring countries of a selected nation. On this instance, the chosen country is Mexico, but we are able to easily adapt the input to visualise some other country.
As you’ll be able to see within the generated image, Mexico shares its border with three countries: the US, Belize, and Guatemala.
Insight 2: Top 10 Countries with the Most Borders
On this section, we are going to analyze which countries have the best variety of neighboring countries and display the outcomes on the screen. To realize this, now we have implemented the calculate_top_border_countries
function. This function assesses the variety of neighbors for every node within the network and displays only those with the best variety of neighbors (top 10).
We must reiterate that the outcomes obtained are depending on the initial GeoJSON file. On this case, the Siachen Glacier is coded as a separate country, which is why it appears as sharing a border with China.
Insight 3: Exploring the Shortest Country-to-Country Routes
We conclude our evaluation with a route assessment. On this case, we are going to evaluate the minimum variety of borders one must cross when traveling from an origin country to a destination country.
The find_shortest_path_between_countries
function calculates the shortest path between an origin country and a destination country. Nonetheless, it’s essential to notice that this function provides only certainly one of the possible shortest paths. This limitation arises from its use of the shortest_path
function from NetworkX
, which inherently finds a single shortest path attributable to the character of the algorithm used.
To access all possible paths between two points, including multiple shortest paths, there are alternatives available. Within the context of the find_shortest_path_between_countries
function, one could explore options equivalent to all_shortest_paths
or all_simple_paths
. These alternatives are able to returning multiple shortest paths as a substitute of only one, depending on the particular requirements of the evaluation.
We employed the function to search out the shortest path between Spain and Poland, and the evaluation revealed that the minimum variety of border crossings required to travel from Spain to Poland is 3.
Python offers a plethora of libraries spanning various domains of information, which could be seamlessly integrated into any data science project. On this instance, now we have utilized libraries dedicated to each geometric data evaluation and graph evaluation to create a graph representing the world’s borders. Subsequently, now we have demonstrated use cases for this graph to rapidly answer questions, enabling us to conduct geographical evaluation effortlessly.
Thanks for reading.
Amanda Iglesias