Where are you going? Do you have to be going that way?

This text presents a way to predict vehicle trajectories on a digital road network using a database of past trips sampled from noisy GPS sensors. Besides predicting future directions, this method also assigns a probability to an arbitrary sequence of locations.
Central to this concept is using a digital map unto which we project all sampled locations by aggregating them into individual trajectories and matching them to the map. This matching process discretizes the continual GPS space into predetermined locations and sequences. After encoding these locations into unique geospatial tokens, we will more easily predict sequences, evaluate the probability of current observations and estimate future directions. That is the gist of this text.
What problems am I trying to resolve here? If you want to analyze vehicle path data, you may have to answer questions like those within the article’s sub-heading.
Where are you going? Do you have to be going that way?
How do you evaluate the probability that the trail under commentary follows steadily traveled directions? That is a very important query as, by answering it, you may program an automatic system to categorise trips based on their observed frequency. A brand new trajectory with a low rating would cause concern and prompt immediate flagging.
How do you are expecting which maneuvers the vehicle will do next? Will it keep going straight ahead, or will it turn right at the subsequent intersection? Where do you expect to see the vehicle in the subsequent ten minutes or ten miles? Quick answers to those questions will assist a web based tracking software solution in providing answers and insights to delivery planners, online route optimizers, and even opportunity charging systems.
The answer I’m presenting here uses a database of historical trajectories, each consisting of a timed sequence of positions generated by the motion of a particular vehicle. Each positional record must contain time, position information, a reference to the vehicle identifier, and the trajectory identifier. A vehicle has many trajectories, and every trajectory has many positional records. A sample of our input data is depicted in Figure 1 below.
I drew the information above from the Prolonged Vehicle Energy Dataset (EVED) [1] article. You’ll be able to construct the corresponding database by following the code in one in every of my previous articles.
Our first job is to match these trajectories to a supporting digital map. The aim of this step isn’t only to eliminate the GPS data sampling errors but, most significantly, to coerce the acquired trip data to an existing road network where each node and edge are known and glued. Each recorded trajectory is thus converted from a sequence of geospatial locations into one other sequence of numeric tokens coinciding with the present digital map nodes. Here, we are going to use open-sourced data and software, with map data sourced from OpenStreetMap (compiled by Geofabrik), the Valhalla map-matching package, and H3 because the geospatial tokenizer.
Edge Versus Node Matching
Map-matching is more nuanced than it would take a look at first sight. As an instance what this idea entails, allow us to take a look at Figure 2 below.
Figure 2 above shows that we will derive two trajectories from an original GPS sequence. We obtain the primary trajectory by projecting the unique GPS locations into the closest (and most definitely) road network segments. As you’ll be able to see, the resulting polyline will only sometimes follow the road since the map uses graph nodes to define its basic shapes. By projecting the unique locations to the map edges, we get recent points that belong to the map but may stray from the map’s geometry when connected to the subsequent ones by a straight line.
By projecting the GPS trajectory to the map nodes, we get a path that completely overlays the map, as shown by the green line in Figure 2. Although this path higher represents the initially driven trajectory, it doesn’t necessarily have a one-to-one location correspondence with the unique. Fortunately, this can be advantageous for us as we are going to at all times map-match any trajectory to the map nodes, so we are going to at all times get coherent data, with one exception. The Valhalla map-matching code at all times edge-projects the initial and final trajectory points, so we are going to systematically discard them as they don’t correspond to map nodes.
H3 Tokenization
Unfortunately, Valhalla doesn’t report the unique road network node identifiers, so we must convert the node coordinates to unique integer tokens for later sequence frequency calculation. That is where H3 enters the image by allowing us to encode the node coordinates right into a sixty-four-bit integer uniquely. We pick the Valhalla-generated polyline, strip the initial and final points (these points are usually not nodes but edge projections), and map all remaining coordinates to level 15 H3 indices.
The Dual Graph
Using the method above, we convert each historical trajectory right into a sequence of H3 tokens. The subsequent step is to convert each trajectory to a sequence of token triplets. Three values in a sequence represent two consecutive edges of the prediction graph, and we wish to know the frequencies of those, as they can be the core data for each the prediction and the probability assessment. Figure 3 below depicts this process visually.
The transformation above computes the twin of the road graph, reversing the roles of the unique nodes and edges.
We will now start to reply the proposed questions.
Do you have to be going that way?
We’d like to know the vehicle trajectory as much as a given moment to reply this query. We map-match and tokenize the trajectory using the identical process as above after which compute each trajectory triplet frequency using the known historic frequencies. The end result is the product of all individual frequencies. If the input trajectory has an unknown triplet, its frequency can be zero as the ultimate path probability.
A triplet probability is the ratio of counts of a particular sequence (A, B, C) to the count of all (A, B, *) triplets, as depicted in Figure 4 below.
The trip probability is just the product of individual trip triplets, as depicted in Figure 5 below.
Where are you going?
We use the identical principles to reply this query but start with the last known triplet only. We will predict the k most definitely successors using this triplet as input by enumerating all triplets which have as their first two tokens the last two of the input. Figure 6 below illustrates the method for triplet sequence generation and evaluation.
We will extract the highest k successor triplets and repeat the method to predict the most definitely trip.
We’re able to discuss the implementation details, starting with map-matching and a few associated concepts. Next, we are going to see find out how to use the Valhalla toolset from Python, extract the matched paths and generate the token sequences. The information preprocessing step can be over once we store the lead to the database.
Finally, I illustrate an easy user interface using Streamlit that calculates the probability of any hand-drawn trajectory after which projects it into the long run.
Map-Matching
Map-matching converts GPS coordinates sampled from a moving object’s path into an existing road graph. A road graph is a discrete model of the underlying physical road network consisting of nodes and connecting edges. Each node corresponds to a known geospatial location along the road, encoded as a latitude, longitude, and altitude tuple. Each directed edge connects adjoining nodes following the underlying road and accommodates many properties similar to the heading, maximum speed, road type, and more. Figure 7 below illustrates the concept with a simple example.
When successful, the map-matching process produces relevant and helpful information on the sampled trajectory. On the one hand, the method projects the sampled GPS points to locations along the most definitely road graph edges. The map-matching process “corrects” the observed spots by squarely placing them over the inferred road graph edges. Then again, the strategy also reconstructs the sequence of graph nodes by providing the most definitely path through the road graph corresponding to the sampled GPS locations. Note that, as previously explained, these outputs are different. The primary output accommodates coordinates along the edges of the most definitely path, while the second output consists of the reconstructed sequence of graph nodes. Figure 8 below illustrates the method.
A byproduct of the map-matching process is the standardization of the input locations using a shared road network representation, especially when considering the second output type: the most definitely sequence of nodes. When converting sampled GPS trajectories to a series of nodes, we make them comparable by reducing the inferred path to a series of node identifiers. We will consider these node sequences as phrases of a known language, where each inferred node identifier is a word, and their arrangement conveys behavioral information.
That is the fifth article where I explore the Prolonged Vehicle Energy Dataset¹ (EVED) [1]. This dataset is an enhancement and review of prior work and provides the map-matched versions of the unique GPS-sampled locations (the orange diamonds in Figure 8 above).
Unfortunately, the EVED only accommodates the projected GPS locations and misses the reconstructed road network node sequences. In my previous two articles, I addressed the problem of rebuilding the road segment sequences from the transformed GPS locations without map-matching. I discovered the result somewhat disappointing, as I expected lower than the observed 16% of defective reconstructions. You’ll be able to follow this discussion from the articles below.
Now I’m the source map-matching tool to see how far it may well go in correcting the defective reconstructions. So let’s put Valhalla through its paces. Below are the steps, references, and code I used to run Valhalla on a Docker container.
Valhalla Setup
Here I closely follow the instructions provided by Sandeep Pandey [2] on his blog.
First, make sure that that you’ve gotten Docker installed in your machine. To put in the Docker engine, please follow the web instructions. In the event you work on a Mac, an ideal alternative is Colima.
Once installed, you could pull a Valhalla image from GitHub by issuing the next commands at your command line, because the shell code in Figure 9 below depicts.
While executing the above commands, you’ll have to enter your GitHub credentials. Also, ensure you’ve gotten cloned this text’s GitHub repository, as some files and folder structures seek advice from it.
Once done, it’s best to open a brand new terminal window and issue the next command to start out the Valhalla API server (MacOS, Linux, WSL):
The command line above explicitly states which OSM file to download from the Geofabrik service, the newest Michigan file. This specification implies that when executed the primary time, the server will download and process the file and generate an optimized database. In subsequent calls, the server omits these steps. When needed, delete every part under the goal directory to refresh the downloaded data and spin up Docker again.
We will now call the Valhalla API with a specialized client.
Enter PyValhalla
This spin-off project simply offers packaged Python bindings to the incredible Valhalla project.
Using the PyValhalla Python package is sort of easy. We start with a neat install procedure using the next command line.
In your Python code, you could import the required references, instantiate a configuration from the processed GeoFabrik files and at last create an Actor object, your gateway to the Valhalla API.
Before we call the Meili map-matching service, we must get the trajectory GPS locations using the function listed below in Figure 13.
We will now arrange the parameter dictionary to pass into the PyValhalla call to trace the route. Please seek advice from the Valhalla documentation for more details on these parameters. The function below calls the map-matching feature in Valhalla (Meili) and is included in the information preparation script. It illustrates find out how to determine the inferred route from a Pandas data frame containing the observed GPS locations encoded as latitude, longitude, and time tuples.
The above function returns the matched path as a string-encoded polyline. As illustrated in the information preparation code below, we will easily decode the returned string using a PyValhalla library call. Note that this function returns a polyline whose first and last locations are projected to edges, not graph nodes. You will notice these extremities removed by code later within the article.
Allow us to now take a look at the information preparation phase, where we convert all of the trajectories within the EVED database right into a set of map edge sequences, from where we will derive pattern frequencies.
Data preparation goals at converting the noisy GPS-acquired trajectories into sequences of geospatial tokens corresponding to known map locations. The major code iterates through the present trips, processing one after the other.
In this text, I exploit an SQLite database to store all the information processing results. We start by filling the matched trajectory path. You’ll be able to follow the outline using the code in Figure 15 below.
For every trajectory, we instantiate an object of the Actor type (line 9). That is an unspoken requirement, as each call to the map-matching service requires a brand new instance. Next, we load the trajectory points (line 13) acquired by the vehicles’ GPS receivers with the added noise, as stated in the unique VED article. In line 14, we make the map-matching call to Valhalla, retrieve the string-encoded matched path, and reserve it to the database. Next, we decode the string into an inventory of geospatial coordinates, remove the extremities (line 17) after which convert them to an inventory of H3 indices computed at level 15 (line 19). On line 23, we save the converted H3 indices and the unique coordinates to the database for later reverse mapping. Finally, on lines 25 to 27, we generate a sequence of 3-tuples based on the H3 indices list and save them for later inference calculations.
Let’s undergo each of those steps and explain them intimately.
Trajectory Loading
Now we have seen find out how to load each trajectory from the database (see Figure 13). A trajectory is a time-ordered sequence of sampled GPS locations encoded as a latitude and longitude pair. Note that we are usually not using the matched versions of those locations as provided by the EVED data. Here, we use the noisy and original coordinates as they existed within the initial VED database.
Map Matching
The code that calls the map-matching service is already presented in Figure 14 above. Its central issue is the configuration settings; aside from that; it’s a fairly straightforward call. Saving the resulting encoded string to the database can be easy.
On line 17 of the major loop (Figure 15), we decode the geometry string into an inventory of latitude and longitude tuples. Note that that is where we strip out the initial and final locations, as they are usually not projected to nodes. Next, we convert this list to its corresponding H3 token list on line 19. We use the utmost detail level to attempt to avoid overlaps and ensure a one-to-one relationship between H3 tokens and map graph nodes. We insert the tokens within the database in the next two lines. First, we save the entire token list associating it to the trajectory.
Next, we insert the mapping of node coordinates to H3 tokens to enable drawing polylines from a given list of tokens. This feature can be helpful afterward when inferring future trip directions.
We will now generate and save the corresponding token triples. The function below uses the newly generated list of H3 tokens and expands it to a different list of triples, as detailed in Figure 3 above. The expansion code is depicted in Figure 19 below.
After triplet expansion, we will finally save the ultimate product to the database, as shown by the code in Figure 20 below. Through clever querying of this table, we are going to infer current trip probabilities and future most-likely trajectories.
We are actually done with one cycle of the information preparation loop. Once the outer loop is accomplished, we’ve got a brand new database with all of the trajectories converted to token sequences that we will explore at will.
You’ll find the entire data preparation code within the GitHub repository.
We now turn to the issue of estimating existing trip probabilities and predicting future directions. Let’s start by defining what I mean by “existing trip probabilities.”
Trip Probabilities
We start with an arbitrary path projected into the road network nodes through map-matching. Thus, we’ve got a sequence of nodes from the map and wish to evaluate how probable that sequence is, using as a frequency reference the known trip database. We use the formula in Figure 5 above. In a nutshell, we compute the product of the possibilities of all individual token triplets.
As an instance this feature, I implemented an easy Streamlit application that permits the user to attract an arbitrary trip over the covered Ann Arbor area and immediately compute its probability.
Once the user draws points on the map representing the trip or the hypothetical GPS samples, the code map matches them to retrieve the underlying H3 tokens. From then on, it’s an easy matter of computing the person triplet frequencies and multiplying them to compute the overall probability. The function in Figure 21 below computes the probability of an arbitrary trip.
The code gets support from one other function that retrieves the successors of any existing pair of H3 tokens. The function listed below in Figure 22 queries the frequency database and returns a Python Counter object with the counts of all successors of the input token pair. When the query finds no successors, the function returns the None constant. Note how the function uses a cache to enhance database access performance (code not listed here).
I designed each functions such that the computed probability is zero when no known successors exist for any given node.
Allow us to take a look at how we will predict a trajectory’s most probable future path.
Predicting Directions
We only need the last two tokens from a given running trip to predict its most definitely future directions. The thought involves expanding all of the successors of that token pair and choosing essentially the most frequent ones. The code below shows the function because the entry point to the directions prediction service.
The above function starts by retrieving the user-drawn trajectory as an inventory of map-matched H3 tokens and extracting the last pair. We call this token pair the seed and can expand it further within the code. At line 9, we call the seed-expansion function that returns an inventory of polylines corresponding to the input expansion criteria: the utmost branching per iteration and the overall variety of iterations.
Allow us to see how the seed expansion function works by following the code listed below in Figure 24.
By calling a path expansion function that generates one of the best successor paths, the seed expansion function iteratively expands paths, starting with the initial one. Path expansion operates by picking a path and generating essentially the most probable expansions, as shown below in Figure 25.
The code generates recent paths by appending the successor nodes to the source path, as shown in Figure 26 below.
The code implements predicted paths using a specialized class, as shown in Figure 27.
We will now see the resulting Streamlit application in Figure 28 below.