Graphs are sets of vertices and their edges:

Where the perimeters represent connections between the nodes. If edges don’t have directions, we call a graph undirected. An actual-life example of an undirected graph could be a chemical molecule, where the vertices are atoms, and bonds are represented as edges.

Nonetheless, sometimes we’d like details about whether the sting goes from *u *to *v*, from *v *to *u*, or each ways. For instance, if Mark likes Alice, it doesn’t necessarily mean it’s mutual ( ☹ ). In those situations, we will define the sting as an ordered tuple as an alternative of unordered one.

Using the graph structure, we will define a centrality measure. It’s a metric used for answering the query:

How vital is that this vertex/edge in a graph?”

And there are various ways to reply it.

Depending on the duty, we will start from a distinct point evaluating centrality. Probably the most common metrics are: Degree, Closeness and Betweenness. We’ll discuss them using Zachary’s Karate Club graph [more info]. It presents ties between different karate club members. Yow will discover code used to generate pictures below here.

## Degree centrality

Essentially the most basic of centralities. It’s defined just for vertices and it’s equal to the degree of the vertex (which is the variety of the neighboring vertices). For instance, we will think back to the graph of human relationships, and in case of the friendships amongst people this metric would answer the query

“How popular is that this person?”

## Paths in graph

For the following two centralities, we’d like to introduce a couple of concepts to our knowledge of the graph theory. All of them are very intuitive, ranging from the sting’s weights. We are able to add weights to our edges, to mark the difference between them. For instance, this might be road length in case of traffic graph.

In graphs we will define paths, that are lists of vertices we’d like to traverse to get from A to B. Consecutive vertices in the trail are neighbors, first vertex is the A, and the last is B. Path distance is the sum of the perimeters weights along of it. The shortest path between A and B is the trail with the smallest distance.

## Closeness centrality

Having all this latest knowledge, we will return to our metrics. Next one is closeness centrality, which tells us how close a node to the remainder of the graph is. It’s defined for a particular vertex as an inverse of a mean of shortest paths to all other vertices within the graph. This fashion, shorter average path translates to higher closeness centrality.

## Betweenness centrality

Betweenness centrality gives us information, which nodes of a graph are crucial for the traffic going through it. Imagine a city with an intensive road network, where every junction is a node. A few of those function a key connectors in each day commutes, while others could also be a cul-de-sacs with near none impact on traffic flow. The previous one possess high Betweenness centrality scores, calculated as proportion of the shortest paths traversing through the intersection.

Now, as we’ve tools for describing and analyzing graph, we will start extracting city’s plan to a graph form. To try this we will **Open Street Maps** (OSM), to import it in Python as NX graph using osmnx library. We’ll start with a smaller example to debate what additional process we’d like to use, to be able to improve time and efficiency of our work.

Grzegórzki is one among the eighteen districts of Krakow’s city, with two complex roundabouts — Mogilskie and Grzegórzeckie, and plenty of junctions. Thus, we’ll have the option to see most of potential pitfalls with data engineering.

Let’s start with importing data from the OSM repository to a Python graph, and plot the outcomes:

There’s something mistaken with this graph — can you see what it’s?

We get multiple edges for single sections of roads, resulting the graph with almost 3 000 “junctions”. This doesn’t provide proper representation (we will’t make a U-turn in the course of a road, and each node cause calculation to be slower). To repair this example, we’ll perform **graph topology simplification** by removing all nodes on the road between two junctions. In OSMnx, we’ve a function for that called *ox.simplify_graph()*.

There’s another catch — as it’s possible you’ll see, we’ve two edges for probably the most of roads, one for every way. Because of this, we’ve multiple nodes for each intersection, which is an unwanted behavior. Imagine that we’re on a junction, we’re turning left, and there’s no separate lane for a left turn (or it’s already full). So long as we won’t have the option to do the turn, the opposite cars are blocked. In our current graph, this isn’t the reality. The left turn is fabricated from 2 separate nodes, one for turning left, and the opposite for crossing opposite lane. This might indicate that those are two independent operations, while they aren’t.

That’s why we’re going to consolidate intersections, meaning that we are going to mix multiple nodes close to one another into one. We’ll select the consolidation radius sufficiently big to consolidate multiple parts of the intersections into one, but alternatively keep roundabouts as multiple node structures, as they might be only partially blocked. To do that we’ll use osmnx function *ox.consolidate_intersections()*.

After those operations, we’re almost ready for the evaluation. The last caveat is Krakow’s municipality borders — as many individuals travel from the neighboring towns, and graph evaluation includes only data throughout the graph, we’d like to incorporate those areas. I’ll present in the following chapter implications of not doing that. And here’s our graph:

Yow will discover the source code used to generate this map, in addition to all graphic utilized in the following chapter on this jupyter notebook.

For this case study we’ll focus only on Betweenness centrality measurement for estimating road traffic. In future, this is likely to be prolonged to other techniques from graph theory, including **GNN** usage (Graph Neural Networks).

We’ll start with calculating Betweenness centrality measurement for all nodes and edges in a road layout representation. For that we are going to use **NetworkX** library.

Because of a high variety of roads on a graph, it’s hard to see which components have highest probability of being critical for traffic. Let’s take a have a look at a centrality measurement distribution for the graph.

We are able to use those distributions to filter out less vital junctions and streets. We’ll select top 2% of every where the edge values are:

- 0.047 for nodes,
- 0.021 for edges.

We are able to see that crucial road segments by betweenness are:

- The A4 highway and the S7 being the beltway of Krakow (note that Krakow doesn’t have northern a part of the beltway),
- The western a part of 2nd ring road and it’s connection to A4,
- The northern a part of third ring road (substituting missing northern beltway),
- The Nowohucka street connecting 2nd ring road with north-eastern a part of town,
- The Wielicka road leading from city center to the south-eastern highway part.

Let’s compare this information to an actual life traffic map of Krakow from Google Maps:

We are able to see that our insights correlate with the outcomes from traffic radar. The mechanism behind that is sort of easy — components with high betweenness centrality are those used to commute most of shortest paths within the graph. If automotive drivers select the most effective paths for his or her routes, then the streets and junctions with the very best traffic volumes will likely be those with the very best betweenness centrality.

Let’s head back to the last a part of the graph engineering — extending graph borders. We are able to check what would occur if we only took town’s borders to our evaluation:

The A4 highway, which is some of the vital component attributable to the beltway nature, has one among the bottom centrality measures in the entire graph! This happens because because the A4 is on the outskirts of town, and most of its traffic comes from the surface, we cannot include this consider the betweenness centrality.

Let’s take a have a look at a distinct scenario for graph evaluation. Suppose that we wish to predict how a road closure (for instance attributable to the accident) affects the traffic. We are able to use the centrality measurements to check differences between two graphs, and thus examine changes within the centrality.

On this study, we’ll simulate automotive accident on A4–7 highway segment, which is a typical occurrence. The accident will cause a whole closure of the segment.

We’ll start by making a latest road network by removing A4–7 segment from graph, and recalculating centrality measurement.

Let’s take a have a look at a centrality distribution:

We are able to see that it’s still very much like the unique one. To examine changes within the centrality measurements we’ll calculate **residual graph**, where centrality measurements are the difference between original road layout and after the accident. Positive values will indicate higher centrality after the accident. Nodes and junctions missing in a single the graphs (similar to A4–7) won’t be included within the residual graph. Below is the measurement distribution of the residuals:

Again, we’ll filter out top 2% of streets and nodes affected. The brink values this time are:

- 0.018 for nodes,
- 0.017 for edges.

We are able to see increases in roads connecting split parts of beltway to town center, where the 2nd ring road is situated. The best change might be seen within the 2nd ring road which comprises one among two left bridges over Vistula river on the western side of town.

There are a couple of things that we cannot take account in during graph evaluation. The 2 most vital ones, that we could see on this evaluation, are:

- Graph centrality evaluation assumes uniform distribution of traffic among the many nodes.

Which is fake generally, as villages and cities have different population densities. Nonetheless, there are other effects that may reduce this, for instance the next amount of individuals living in neighboring villages will select a automotive as a commute option as compared to the people living in a city center.

- Graph evaluation takes into the account only things which might be present throughout the graph.

That is harder to see within the provided examples, especially for somebody outside the Krakow. Let’s take a have a look at **Zakopianka**. It’s a serious traffic artery between town centre and a lot of the municipalities south of Krakow, and it’s also a part of DK7 (national road no. 7) which spans across whole country.

If we compare typical traffic on DK7 in Krakóow to our centrality measures, they’re completely different. Average betweenness centrality is around 0.01, which is a two times smaller value than the highest 2% threshold. While in point of fact, it’s some of the blocked sections.

Graph theory and its evaluation have applications in multiple scenarios, similar to traffic evaluation presented on this study. Using basic operations and metrics on graphs, we will get priceless insights in much shorter time as compared to constructing a complete simulation model.

This whole evaluation might be performed using several dozen lines of Python code, and it’s not limited to at least one road layout. We may also very easily transition to other evaluation tools from Graph Theory.

As all things, this method has also its drawbacks. The most important ones being assumptions about uniform traffic distribution and scope limited to graph structure.

Github repository containing code utilized in this study might be found here.