How briskly will you drive?
When planning future vehicle routes, you trust the digital map providers for accurate speed predictions. You do that when picking your phone up to organize for a automotive trip or, in knowledgeable setting, when planning routes in your vehicle fleet. The forecasted speeds are a vital part of the trip’s cost, especially as they’re one in every of the basic drivers of energy consumption in electric and combustion vehicles.
The digital mapping service providers collect information from live traffic and historical records to estimate how briskly you’ll be able to drive along a selected road at any given time. With this data and intelligent algorithms, they estimate how quickly a median vehicle will travel through the planned route. Some services will accept each vehicle’s features to tune the route path and the expected trip times.
But what about specific vehicles and drivers like yours? Do these predictions apply? Your cars and drivers might need particular requirements or habits that don’t fit the standardized forecasts. Can we do higher than the digital map service providers? We’ve a probability for those who keep a very good record of your historical telematics data.
In this text, we’ll improve the standard of speed predictions from digital map providers by leveraging the historical speed profiles from a telematics database. This database comprises records of past trips that we use to modulate the usual speed inferences from a digital map provider.
Central to that is map-matching, the method by which we “snap” the observed GPS locations to the underlying digital map. This correcting step allows us to bring the GPS measurements in keeping with the map’s road network representation, thus making all location sources comparable.
The Road Network
A road network is a mathematical concept that supports most digital mapping applications. Often implemented as a directed multi-graph, each node represents a known geospatial location, normally some noteworthy landmark resembling an intersection or a defining point on a road bend, and the connecting directed edges represent straight-line paths along the road. Figure 1 below illustrates the concept.
Once we request a route from a digital map provider, we get a sequence of road network nodes and their connecting edges. The service also provides the estimated travel times and corresponding speeds between all pairs of nodes (in some cases, the speed estimates cover a variety of nodes). We get the entire trip duration by adding all of the partial times together.
If we improve estimates for these times, we can even have higher speed estimates and a greater route prediction overall. The source for these higher estimates is your historical telematics data. But knowing the historic speeds is just an element of the method. Before we are able to use these speeds, we must make sure that that we are able to project them onto the digital map, and for this, we use map-matching.
Map-Matching
Map-matching projects sequences of GPS coordinates sampled from a moving object’s path to an existing road graph. The matching process uses a Hidden Markov Model to map the sampled locations to the most probably graph edge sequence. In consequence, this process produces each the sting projections and the implicit node sequence. You’ll be able to read a more detailed explanation in my previous article on map matching:
After reading the above article, you’ll understand that the Valhalla map-matching algorithm projects the sampled GPS locations into road network edges, to not the nodes. The service might also return the matched poly-line defined when it comes to the road network nodes. So, we are able to get each edge projections and the implicit node sequence.
However, when retrieving a route plan from the identical provider, we also get a sequence of road network nodes. By matching these nodes to the previously map-matched ones, we are able to overlay the known telematics information to the newly generated route and thus improve the time and speed estimates with actual data.
Before using the map-matched locations to infer actual speeds, we must project them to the nodes and adjust the known travel times, as illustrated in Figure 2 below.
As a prerequisite, we must accurately sequence each sets of locations, the nodes, and the map matches. This process is depicted in Figure 2 above, where the map matches, represented by the orange diamonds, are sequenced together with the road network nodes, represented as red dots. The travel sequence is clearly from left to right.
We assume the time differences between the GPS locations are the identical because the map-matched ones. This assumption, illustrated by Figure 3 below, is important because there is no such thing as a approach to infer what effect in time the map matching has. This assumption simplifies the calculation while keeping a very good approximation.
Now that we all know the time differences between consecutive orange diamonds, our challenge is to make use of this information to infer the time differences between consecutive red dots (nodes). Figure 4 below shows the connection between the 2 sequences of time differences.
We will safely assume that the average speeds between consecutive orange diamonds are constant. This assumption is important for what comes next. But before we proceed, let’s define some terminology. We’ll use some simplifications as a result of Medium’s typesetting limitations.
We want to take care of two fundamental quantities: distances and timespans. Using Figure 4 above as a reference, we define the space between the orange diamond one and red dot one as d(n1, m1). Here, the letter “m” stands for “map-matched,” and the letter “n” stands for node. Similarly, the corresponding timespan is t(n1, m1), and the typical speed is v(n1, m1).
Allow us to give attention to the primary two nodes and see how we are able to derive the typical speed (and the corresponding timespan) using the known timespans from orange diamonds one to 4. The common speed of travel between the primary two map-matched locations is thus:
Because the typical speed is constant, we are able to now compute the primary timespan.
The second timespan is just t(m2, m3). For the ultimate period, we are able to repeat the method above. The full time is thus:
We must repeat this process, adapting it to the sequence of nodes and map matches to calculate the projected trip times between all nodes.
Now that we have now seen methods to project measured speeds onto a digital map let’s see where to get the information.
Telematics Database
This text uses a telematics database to infer unknown road segment average speeds. All of the geospatial data within the database is already map-matched to the underlying digital map. This characteristic helps us match future service-provided routes to the known or projected road segment speeds using the abovementioned process.
Here, we’ll use a tried-and-true open-source telematics database I actually have been exploring currently and presented in a previously published article, the Prolonged Vehicle Energy Dataset (EVED), licensed under Apache 2.0.
We develop the answer in two steps: data preparation and prediction. In the information preparation step, we traverse all known trips within the telematics database and project the measured trip times to the corresponding road network edges. These computed edge traversal times are then stored in one other database using maximum resolution H3 indices for faster searches during exploration. At the tip of the method, we have now collected traversal time distributions for the known edges, information that may allow us to estimate travel speeds within the prediction phase.
The prediction phase requires a source route expressed as a sequence of road network nodes, resembling what we get from the Valhalla route planner. We query each pair of consecutive nodes’ corresponding traversal time distribution (if any) from the speed database and use its mean (or median) to estimate the local average speed. By adding all edge estimates, we get the intended result, the expected total trip time.
Data Preparation
To organize the information and generate the reference time distribution database, we must iterate through all of the trips within the source data. Fortunately, the source database makes this easy by readily identifying all of the trips (see the article above).
Allow us to have a look at the code that prepares the sting traversal times.
The code in Figure 5 above shows the foremost loop of the information preparation code. We use the previously created EVED database and save the output data to a brand new speed database. Each record is a time traversal sample for a single road network edge. For a similar edge, a set of those samples makes up for a statistical time distribution, for which we calculate the typical, median, and other statistics.
The decision on line 5 retrieves an inventory of all of the known trips within the source database as triplets containing the trajectory identifier (the table sequential identifier), the vehicle identifier, and the trip identifier. We want these last two items to retrieve the trip’s signals, as shown in line 10.
Lines 10 to 16 contain the code that retrieves the trip’s trajectory as a sequence of latitude, longitude, and timestamps. These locations don’t necessarily correspond to road network nodes; they are going to mostly be projections into the perimeters (the orange diamonds in Figure 2).
Now, we are able to ask the Valhalla map-matching engine to take these points and return a poly-line with the corresponding road network node sequence, as shown in lines 18 to 25. These are the nodes that we store within the database, together with the projected traversal times, which we derive in the ultimate lines of the code.
The traversal time projection from the map-matched locations to the node locations occurs in two steps. First, line 27 creates a “compound trajectory” object that merges the map-matched locations and the corresponding nodes within the trip sequence. The thing stores each map-matched segment individually for later joining. Figure 6 below shows the article constructor (source file).
The compound trajectory constructor starts by merging the sequence of map-matched points to the corresponding road network nodes. Referring to the symbols in Figure 2 above, the code combines the orange diamond sequence with the red dot sequence in order that they keep the trip order. In step one, listed in Figure 7 below, we create an inventory of sequences of orange diamond pairs with any red dots in between.
Once merged, we convert the trajectory segments to node-based trajectories, removing the map-matched endpoints and recomputing the traversal times. Figure 8 below shows the function that computes the equivalent between-node traversal times.
Using the symbology of Figure 2, the code above uses the traversal times between two orange diamonds and calculates the times for all sub-segment traversals, namely between node-delimited ones. This fashion, we are able to later reconstruct all between-node traversal times through easy addition.
The ultimate conversion step occurs on line 28 of Figure 5 once we convert the compound trajectory to a straightforward trajectory using the functions listed in Figure 9 below.
The ultimate step of the code in Figure 5 (lines 30–32) is to save lots of the computed edge traversal times to the database for posterior use.
Data Quality
How good is the information that we just prepared? Does the EVED allow for good speed predictions? Unfortunately, this database was not designed for this purpose so that we are going to see some issues.
The primary issue is the variety of single-edge records in the ultimate database, on this case, somewhat over two million. The full variety of rows is over 5.6 million, so the unusable single-edge records represent a large proportion of the database. Almost half the rows are from edges with ten or fewer records.
The second issue is trips with very low frequencies. When querying an ad-hoc trip, we may fall into areas of very low density, where edge time records are scarce or nonexistent. In such conditions, the prediction code tries to compensate for the information loss using a straightforward heuristic: assume the identical average speed as within the last edge. For larger road sections, and as we see below, we may even copy the information from the Valhalla route predictor.
The underside line is that a few of these predictions will likely be bad. A greater use case for this algorithm can be to make use of a telematics database from fleets that often travel through the identical routes. It will be even higher to get more data for a similar routes.
Prediction
To explore this time-prediction enhancement algorithm, we’ll use two different scripts: one interactive Streamlit application that lets you freely use the map and an analytics script that tries to evaluate the standard of the expected times by comparing them to known trip times in a LOOCV kind of approach.
Interactive Map
You run the interactive application by executing the next command line on the project root:
streamlit run speed-predict.py
The interactive application lets you specify endpoints of a route for Valhalla to predict. Figure 10 below shows what the user interface looks like.