I have observed a notable decrease in the number of successful matches among datasets from the Per Granted Database when using pgpub_id
as the linking key, particularly after the filing year of 2014. However, this discrepancy is not present when matching datasets from the Granted Database using patent_id
as the linking key.
For instance, as depicted below, the pg_published_application
dataset contains 380,954 patents for the filing year of 2014 and 380,603 for 2015. Nonetheless, when matched with the pg_assignee_disambiguated
dataset, the number of successful matches dropped sharply from 75,180 in 2014 to 38,413 in 2015.
Below are the STATA code and the resulting output:
xxxxxxxxxx
use "./dta/pg_published_application.dta", clear
distinct pgpub_id
/*
Result
----------+----------------------
| total distinct
----------+----------------------
pgpub_id | 7369987 7369987
----------+----------------------
*/
** process variables
todatetime filing_date, replace datefmt(YMD)
format filing_date %td
gen filing_year = year(filing_date)
tab filing_year
/*
Result
filing_year | Freq. Percent Cum.
------------+-----------------------------------
1909 | 1 0.00 0.00
1911 | 1 0.00 0.00
1913 | 1 0.00 0.00
1918 | 3 0.00 0.00
1919 | 2 0.00 0.00
1988 | 1 0.00 0.00
1990 | 2 0.00 0.00
1991 | 3 0.00 0.00
1992 | 8 0.00 0.00
1993 | 24 0.00 0.00
1994 | 65 0.00 0.00
1995 | 184 0.00 0.00
1996 | 301 0.00 0.01
1997 | 1,616 0.02 0.03
1998 | 5,789 0.08 0.11
1999 | 9,754 0.13 0.24
2000 | 23,979 0.33 0.57
2001 | 218,705 2.97 3.53
2002 | 245,992 3.34 6.87
2003 | 293,229 3.98 10.85
2004 | 301,499 4.09 14.94
2005 | 307,805 4.18 19.12
2006 | 323,549 4.39 23.51
2007 | 327,802 4.45 27.96
2008 | 316,627 4.30 32.25
2009 | 296,166 4.02 36.27
2010 | 308,018 4.18 40.45
2011 | 332,389 4.51 44.96
2012 | 354,431 4.81 49.77
2013 | 374,230 5.08 54.85
2014 | 380,954 5.17 60.02
2015 | 380,603 5.16 65.18
2016 | 381,641 5.18 70.36
2017 | 389,943 5.29 75.65
2018 | 392,668 5.33 80.98
2019 | 416,882 5.66 86.63
2020 | 399,463 5.42 92.05
2021 | 340,753 4.62 96.68
2022 | 228,961 3.11 99.78
2023 | 15,942 0.22 100.00
------------+-----------------------------------
Total | 7,369,986 100.00
*/
** get assignee information
merge 1:m pgpub_id using "./dta/pg_assignee_disambiguated.dta"
/*
Result Number of obs
-----------------------------------------
Not matched 4,064,588
from master 4,064,588
from using 0
Matched 3,467,530
-----------------------------------------
*/
keep if _merge == 3
drop _merge
tab filing_year if assignee_sequence == 1 & assignee_type == 2
/*
Result
filing_year | Freq. Percent Cum.
------------+-----------------------------------
1993 | 2 0.00 0.00
1994 | 3 0.00 0.00
1995 | 6 0.00 0.00
1996 | 7 0.00 0.00
1997 | 78 0.01 0.01
1998 | 238 0.02 0.03
1999 | 405 0.03 0.06
2000 | 2,370 0.18 0.24
2001 | 28,227 2.15 2.39
2002 | 36,295 2.77 5.16
2003 | 40,574 3.09 8.25
2004 | 47,574 3.63 11.88
2005 | 59,848 4.56 16.45
2006 | 64,892 4.95 21.40
2007 | 74,810 5.71 27.10
2008 | 78,724 6.00 33.11
2009 | 74,259 5.66 38.77
2010 | 81,959 6.25 45.02
2011 | 92,254 7.04 52.06
2012 | 111,572 8.51 60.57
2013 | 112,897 8.61 69.18
2014 | 75,180 5.73 74.92 // number of matched `pgpub_id' decreased sharply after 2014
2015 | 38,413 2.93 77.85
2016 | 40,386 3.08 80.93
2017 | 42,896 3.27 84.20
2018 | 42,766 3.26 87.46
2019 | 46,426 3.54 91.00
2020 | 45,352 3.46 94.46
2021 | 43,317 3.30 97.76
2022 | 27,015 2.06 99.82
2023 | 2,299 0.18 100.00
------------+-----------------------------------
Total | 1,311,044 100.00
*/
Considering my objective to examine the patent application behavior of firms from 2012 to 2019, I am seeking guidance on addressing and rationalizing the pronounced reduction in the number of patent records post-2014.
Thank you for your assistance!