Uncertainty regarding entries in "citation_patent_id" column
I was wondering how the entry in the citation_patent_id column is decided. Because it is not the case that our patent_id only cites 1 other patent. I also notice that if one patent_id occurs 4 times in the dataset, then why does the citation_patent_id sometimes differ and sometimes not differ (and why is it restricted to only 1 value)?
On a deeper level, I also found the following:
Selecting a random citation_id entry from the above, say: 9715899
Ground-level checks:
- This patent is cited by 33 patents (as per Google patents), but occurs in the dataset as a citation_id only once.
Selecting a random patent that cites 9715899, say 11381412
Ground-level checks:
- Does 11381412 exist in our dataset as a patent = YES
- Does this patent cite our 9715899 = NO
- Which patent does this cite = 8487996
Does 8487996 exist as a patent citation for our patent on Google patents? = NO