I am currently using the PAIR/PatEx data and am using PatentsView to get the disambiguated inventor and assignee information. I have tried to filter the data based on the data description but there's still a discrepancy of about 170000 applications.
Briefly, I did the following:
a) Dropped all observations before 2006 and after 2019
b) Dropped all non-utility non-provisional applications
c) Confirmed that there are no duplicates by application number in PatEx.
The resulting size was;
I looked up specific application numbers in PatEx which were unsuccessful, but these were not in PatentsView.
Is the Pre-Grant Data only a subset of the published applications (utility patents after 2005)? How is this subset defined?