PatentsView Project Location Standardization
The PatentsView location standardization process consists of four major steps. The process heavily relies on the OSMNames, which is derived from OpenStreetMap. OpenStreetMap is a collaborative open data project that crowdsources the creation of a global geographic dataset. The steps below describe how the locations in PatentsView are standardized. The PatentsView Locations Standardization is a derivative of OpenStreetMap (OSM) data that is distributed under the Open Database License (ODbL).
Creating a curated locations database of roughly two million global locations using OSMNames. To create this dataset, PatentsView filtered the OSMNames data by setting the class variable to "place" and the type variable to "city." This lookup table of canonical names ensures that the final set of locations and their unique identifiers are standard and stay consistent from data release to data release.
Extracting all raw location mentions (or record) related to all assignees, inventors, and applicants on the granted patent and the pre-grant publication. Using the city, state, and country elements, each location is assigned a latitude and longitude using the universe of OSMNames.
Using a two-part process, assign each location a conical location from the step 1 for all raw location mentions in step 2. First, all the exact matches of raw location records to the curated location database (Step 1) are assigned the location identifier for that location. For those locations that do not have exact matches, the team uses a haversine distance function to find the nearest neighbor in the curated locations database from step 1.
Creating the locations dataset for data release. The g_location and pg_location tables available for download on the PatentsView bulk downloads webpages are a subset of of the curated locations (step 1).