I am a new user of patents data and wanted to ask if there is a data dictionary explaining the data. For example, when using the US citations data there is a variable called category, which shows whether a patent was cited by examiner, by applicant, by other or null. What does it mean when a patent has null as category of citations? Or when it is cited by other? If I'm using citations data as a proxy for knowledge/technology flows is it prudent to use patent citations with a null or other category? If I drop the "null" and "cited by other" citations, I lose about 43% of the citations, which may bias my results. Here is a snapshot of the citations per category:
category Freq. Percent Cum.
NULL 21,863,085 20.44 20.44
cited by applicant 36,981,491 34.58 55.02
cited by examiner 22,668,307 21.19 76.21
cited by other 25,441,798 23.79 100.00
cited by third party 2,845 0.00 100.00
imported from a related application 718 0.00 100.00
Total 106,958,244 100.00