Skip to main content
 
 
 
IN THIS SECTION
6 posts
hockey_and_pat…
Last seen: 12/20/2023 - 10:06
Joined: 08/16/2023 - 10:14
WKUs and Patent ID's

In the USPTO bulk files, patent documents are identified by WKU's. Are these the same as the patent_id's in g_patent.tsv, for example? I think they are often the same but not always (when merging wku onto patent_id, most records are matched but not all). 

Is there any PatentsView file that maps patent_id's back to WKU's? 

PVTeam
Role: moderator
Last seen: 04/23/2024 - 10:43
Joined: 10/17/2017 - 10:47
WKUs and Patent ID formatting

Hello

One reason why a patent_id might differ slightly from how it appears in the raw bulk data is that patent_ids with an alphabetic prefix (e.g. design patents begin with "D" and botanic patents begin with "PP") are often printed in the bulk data with one or more leading zeroes between the alphabetic prefix and the numeric component of the ID when the numeric component is small; leading zeros like this are removed from the ID in the PatentsView release in order to match how these IDs are displayed in other USPTO products such as the Patent Public Search. For example, patent D958511 might appear in some raw data products as D0958511.

We're happy to provide any additional information, but our team isn't familiar with the abbreviation "WKU" in this context. If the above information doesn't resolve your problem, would you mind expanding on what you mean by WKU and pointing to what you're referring to in the bulk files?

Thanks,
PVTeam

hockey_and_pat…
Last seen: 12/20/2023 - 10:06
Joined: 08/16/2023 - 10:14
Thank you so much for your…

Thank you so much for your reply; it was very helpful. By WKU I am referring to the identifier found in these files (as an example):

 

https://bulkdata.uspto.gov/data/patent/grant/redbook/fulltext/1980/

 

If you download, say, the first file pftaps19800101_wk01.zip and look at the text file pftaps19800101_wk01.txt within, you will see that patents are identified by something called a "WKU." This sounds quite similar to what you described above, so it is likely we are discussing the same thing. 

 

 

PVTeam
Role: moderator
Last seen: 04/23/2024 - 10:43
Joined: 10/17/2017 - 10:47
WKU and Patent Number

Thank you for the additional detail!

Yes, I can confirm that these are in fact the same data element. WKU was the code used to identify patent ID in the Patent Office's APS text data files between 1996 and 2001, and when the data format changed to XML in 2001, the "WKU" label was discontinued in favor of the more transparent "DNUM"(2001-2004) and "doc-number"(2005-present) labels.

The full specification of the APS text data tags is available in the Patent APS Greenbook document, with the specification of the Patent Number (WKU) on pages 30-31 of the pdf (pages 18-19 in the document's internal page numbering). These pages confirm the use of leading zeroes, which the PatentsView tables remove.

Glad we could help!
Best,
PVTeam

richk
Last seen: 02/27/2024 - 09:06
Joined: 10/06/2023 - 13:44
Is this link wrong?

Maybe I'm reading this reference below incorrectly, but it seems like USPTO guidance to standardize on leading zeros to create 7 digits, but then appears to remove the zeros in Patent Center and other products?

https://www.uspto.gov/patents/apply/applying-online/patent-number

Design : (e.g., Dnnnnnnn, D0000126) must enter leading zeroes between "D" and number to create 7 digits.

richk
Last seen: 02/27/2024 - 09:06
Joined: 10/06/2023 - 13:44
Related

Related to this, looking at the generic_parser_1976_2001.py parser provided on PatentsView github why are these substitutions done? Why does it strip out the last digit in the older APS formatted patent numbers?

 

                        for line in patent:
                            if line.startswith("WKU"):
                                patnum = re.search('WKU\s+(.*?)$',line).group(1)
                                updnum = re.sub('^H0','H',patnum)[:8]
                                updnum = re.sub('^RE0','RE',updnum)[:8]
                                updnum = re.sub('^PP0','PP',updnum)[:8]
                                updnum = re.sub('^PP0','PP',updnum)[:8]
                                updnum = re.sub('^D0', 'D', updnum)[:8]
                                updnum = re.sub('^T0', 'T', updnum)[:8]
                                if len(patnum) > 7 and patnum.startswith('0'):
                                    updnum = patnum[1:8]