Efficient geospatial manipulations for OSM map data

In case you’ve worked with OSM data before, you recognize it’s not the simplest to extract. OSM data will be huge, and finding performant solutions for what you need to analyze is commonly a challenge. PyrOSM is a package that makes the means of reading in and dealing with OSM data rather more efficient. How? Well, PyrOSM is construct on Cython (C Python) and it uses faster libraries for deserializing OSM data in addition to smaller optimizations like numpy arrays which allows it to process data fast. Especially in case you’ve used OSMnx before (for very similar usecases), you recognize that giant datasets take a really very long time to load into memory, which is where PyrOSM can show you how to work with them. Let’s get into what this library can do!
🌎 PBF Data
Let’s talk a bit in regards to the specific file format that OSM data is available in. PBF stands for “Protocolbuffer Binary Format” and it is vitally efficient for working with OSM data is stored. OSM data is organized in “fileblocks”, that are groups of information that will be independently encoded or decoded. Fileblocks contain PrimitiveGroups, which in turn include hundreds of OSM entities, like nodes, ways and relations.
The info will be scaled based on the user’s desired level of granularity. As an illustration, the present OSM database’s resolution is around ~1 cm. In truth, in case you wanted, you would download the whole thing of Open Street Maps data into one file, generally known as Planet (around 1000 Gb of information)!
👩💻 PyrOSM Basics: reading in datasets
PyrOSM is a package that reads in Open Street Map’s PBF data based on two fundamental data distributors: Geofabrik (world and country-level data) and BBBike (city-level data). The package allows the user to access many varieties of features:
- Buildings, POIs (points of interest), Land Use
- Street Networks
- Custom Filters
- exporting as networks
- and more!
There are 235 cities internationally currently supported by BBBike, and you’ll be able to get access to the total list easily by calling the “sources.cities.available” method. Getting began is simple enough, you just initialize an OSM reader object and cargo in the info you wish:
From this point, you’ll have to be using the OSM object to interact with the Berkeley data. Now let’s get the Berkeley street network for driving:
Printing out the actual street_network object shows it’s stored in a GeoPandas GeoDataFrame with all of the OSM attributes like length, highway, maxspeed etc., which will be very handy for further evaluation.
Side Note: BBBikes (the source provider of this data) has many more data formats of various sizes, including Organic Maps OSM, Garmin OSM or SVG Mapnik depending on what your use case is.
🔍 Higher Filtering
The outcomes of the info loading above include all of Berkeley’s data and in reality even data from the cities neighboring it, which is just not ideal. What in case you desire a much smaller or more specific area? That’s where using a bounding box is available in. To make a bounding box you’ll be able to either:
- Manually specify an inventory of 4 coordinates within the format of [minx, miny, maxx, maxy]
- pass in Shapely geometries (e.g a LineString or Multipolygon)
To search out bounding box coordinates, I typically use this bbox finder website that enables you to make rectangles after which copy the coordinates. Here’s tips on how to sure the world around UC Berkeley’s campus and get its walking network:
🎯 Exporting and Working with Graphs
One other benefit of PyrOSM is the way it allows for network processing and connecting to other network evaluation libraries. Along with saving street networks as geodataframes, PyrOSM enables you to extract nodes and edges by storing them in 2 separate dataframes. Here’s the nodes one:
If you have got these graph representations, it’s very easy to save lots of them in various formats: OSMnx, igraph and Pandana and work with them there.
💭 Parting Thoughts
This was a brief summary of what pyrosm can do for you in your geospatial work! I touched on some methods that will be very useful, like downloading specific datasets from an area, or through bounding the world of interest and likewise how this pertains to other libraries. I believe the perfect things about pyrosm is precisely this: the very fact it bridges the gap between huge OSM datasets and the engineering or analytics questions you’ll be able to answer with it.
Thanks for reading!