Skip to main content
 
 
 
IN THIS SECTION
2 posts
rory_mullen
Last seen: 02/08/2024 - 06:15
Joined: 01/01/2024 - 15:16
g_assignee_disambiguated not merging well with g_detail_desc_text

Hi there, 

Thanks for creating this great platform!

I'm trying to merge the g_assignee_disambiguated file with the g_detail_desc_text_yyyy files on patent_id, and I'm finding that only very few patent_id values match across these files. Specifically, for

2015: ~300,000 patent ids in g_detail_desc_text_2016, only ~10,000 matches with g_assignee_disambiguated are found
2016: ~305,000 patent ids in g_detail_desc_text_2016, only ~10,000 matches with g_assignee_disambiguated are found

I have not checked other years, but it does seem to me that these match rates are too low. For example, for the corresponding pg_assignee_disambiguated and pg_detail_desc_text_yyyy files, the match rates are around ten times higher, yielding over 100,000 matches for the same years.

I could be mistaken, but is it possible that something went wrong with the recent December 2023 update to the g_assignee_disambiguated file? 

Thank you again!

Best wishes, 

Rory

rory_mullen
Last seen: 02/08/2024 - 06:15
Joined: 01/01/2024 - 15:16
My stupid mistake, sorry everyone!

My mistake, theg_assignee_disambiguated dataset is absolutely fine. I was filtering for assignee_sequence == 1 (as in the pre-grant "pg" data), but I should have been filtering for assignee_sequence == 0. Hopefully this helps somebody avoid my stupid mistake in the future :)

Best wishes, 

Rory