Skip to main content
 
 
 
IN THIS SECTION
4 posts
lightsword1496
Last seen: 09/05/2022 - 15:38
Joined: 06/17/2022 - 09:59
Why rawlocation and location dataset both have unique values?

Dear all,

I'm exploring rawlocation.csv and location.csv, I use stata trying to merge them two. I found rawlcoation location_id and location id are many to 1 relationship. I use merge m:1, but neither _merge==1 and _merge==2, which means rawlocation has unique observations that not appear in location, location also has unique observations that not appear in rawlocation dataset. I wonder why this is the case? Shouldn't _merge==1 or ==2 be 0? Maybe this is due to the disambiguity? 

Thanks!

PVTeam
Role: moderator
Last seen: 04/24/2024 - 12:31
Joined: 10/17/2017 - 10:47
unique IDs in location files

Hello!

We have confirmed that all of the location IDs present in the disambiguated location file do exist in the raw locations files. Please keep in mind that while there is a single disambiguated locations file for locations appearing in both granted patents and pregrant publications, raw locations are split into those that appear in the granted patents and those that appear in the pregrant publications; in order to have the full set of raw locations, you will need to use both rawlocation files from the granted and pregrant sections of our downloads page.

Regarding location IDs present in the raw locations and not the disambiguated locations file, we have identified a subset of the raw location records from pregrant publications published between 01 April 2021 and 1 July 2021 which are assigned location IDs that do not appear in the full disambiguated locations file. It's possible that a previous disambiguation cycle reassigned the location ID for those locations and did not update the raw records due to a bug or interruption. Our next data update, which we expect to release in September, includes a major update to our location disambiguation process and should reassign these records to appropriate location IDs. Some raw records may also fail to match with records in the disambiguated locations file because those records have not been assigned a disambiguated location ID. About 0.5% of our pregrant raw locations and about 0.1% of our granted raw locations do not have an associated location ID. This usually means that the location name was spelled or formatted irregularly in such a way that a match couldn't be found in our disambiguated locations.

Best,
PVTeam

lightsword1496
Last seen: 09/05/2022 - 15:38
Joined: 06/17/2022 - 09:59
weird pattern in inventor datasets

Hi, 

Thanks for the reply. I tried what you say and I get the expected result. However, can you also explain the relationships between datasets of inventor and assignees? Because when I tried inventor datasets, i still get weird patents. What I did is:

1. I append rawinventor for pre-grant patents, and rawinventor for grant patents, and then merge with disambiguated inventor. What I get is all unmatched observations come from disambiguated inventor. Shouldn't they come from raw files?

2. I also tried merge granted_patent_crosswalk with rawinventor for pregrant patents to get information on patent_id. And then I append the rawinventor for grant patents. And then merge it with disambiguated inventor dataset. I also get this weird result in which all observations come from disambiguated dataset.

Can you explained why this is the case? And maybe also the relationships between datasets of assignees?

Thanks a lot

lightsword1496
Last seen: 09/05/2022 - 15:38
Joined: 06/17/2022 - 09:59
Hi,  Thanks for the reply…

Hi, 

Thanks for the reply. I tried what you say and I get the expected result. However, can you also explain the relationships between datasets of inventor and assignees? Because when I tried inventor datasets, i still get weird patents. What I did is:

1. I append rawinventor for pre-grant patents, and rawinventor for grant patents, and then merge with disambiguated inventor. What I get is all unmatched observations come from disambiguated inventor. Shouldn't they come from raw files?

2. I also tried merge granted_patent_crosswalk with rawinventor for pregrant patents to get information on patent_id. And then I append the rawinventor for grant patents. And then merge it with disambiguated inventor dataset. I also get this weird result in which all observations come from disambiguated dataset.

Can you explained why this is the case? And maybe also the relationships between datasets of assignees?

Thanks a lot