Skip to main content
 
 
 
IN THIS SECTION
8 posts
jcb1996
Last seen: 01/22/2020 - 10:35
Joined: 12/19/2019 - 09:41
Issue opening patent.tsv

Hi. I am able to download the patent.tsv table; however, when I try to open the tsv using WinZip GUI I get the following error:

'The size of the extracted file (5506544357) does not match the uncompressed size (1211577061) recorded in the Zip file.'

 

When I try to read and unzip the file using Python, I get: 'Bad CRC-32'

Is this issue just on my end?

 

Thanks,

Justin

Russ
Last seen: 03/21/2024 - 09:05
Joined: 11/14/2017 - 22:15
worked for me

Justin,

I'm able to download and unzip the file with both 7-zip and python.  Locally the zipped file size is 1,484,891,937 bytes and the uncompressed file is 5,506,544,357 bytes.  7z l patent.tsv.zip shows

7-Zip 19.00 (x64) : Copyright (c) 1999-2018 Igor Pavlov : 2019-02-21

Scanning the drive for archives:
1 file, 1484891937 bytes (1417 MiB)

Listing archive: patent.tsv.zip

--
Path = patent.tsv.zip
Type = zip
Physical Size = 1484891937

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2019-12-10 14:33:11 .....   1211577061   1484891775  patent.tsv
------------------- ----- ------------ ------------  ------------------------
2019-12-10 14:33:11         1211577061   1484891775  1 files

Russ

jcb1996
Last seen: 01/22/2020 - 10:35
Joined: 12/19/2019 - 09:41
Still not working

Good Morning Russ,

I tried again today and it is still not working. In addition, one of my co-worker's tried to unzip it and they are getting the same error as me. The file downloads fine, the issue arises when trying to unzip it. We are both getting the following WinZip error:  The size of the extracted file (5506544357) does not match the uncompressed size (1211577061) recorded in the Zip file.

Russ
Last seen: 03/21/2024 - 09:05
Joined: 11/14/2017 - 22:15
Try 7zip?

Justin,

Does the downloaded file size match mine? Could you try 7zip from 7-zip.org or opening in windows explorer?  I was able to extract the tsv from perl and python.  Are you able to unzip other files from the downloads page?

Something really odd seems to be going on. My best guess is that you don't have a complete file or it got corrupted somehow. Pvteam would need to fix the file if something is wrong with it, but it worked for me.

Here's a post about using python to resume a download in case you need it https://stackoverflow.com/a/22894873 (note the first comment)

Russ

 

jcb1996
Last seen: 01/22/2020 - 10:35
Joined: 12/19/2019 - 09:41
Updates

I'm able to unzip other files from the downloads page - - both using WinZip and through my python script which downloads and unzips ~15 tables. Patents.tsv is the only one I am having trouble with.

Per your suggestion, I downloaded 7zip. When using 7zip, the unzip works fine. I tried again with WinRAR and WinZip -- both of these are continuing to fail. This is strange.

I now need to figure out how to get this to work using my python script. What python libraries are you using to download and unzip the file?

Thanks,

Justin

Russ
Last seen: 03/21/2024 - 09:05
Joined: 11/14/2017 - 22:15
python libraries

Justin,

This works using python 2.7.17

import zipfile
with zipfile.ZipFile('patent.tsv.zip', 'r') as zip_ref:
    zip_ref.extractall('.')

but python 3.8.0 throws an error:
raise BadZipFile("Bad CRC-32 for file %r" % self.name)

so maybe something is wrong with the file.  I'm more of a perl/LWP guy but it looks like requests would work, at least for patent_contractawardnumber.tsv.zip.

import requests

url = 'http://s3.amazonaws.com/data.patentsview.org/20191008/download/patent_contractawardnumber.tsv.zip'
r = requests.get(url, allow_redirects=True)
open('patent_contractawardnumber.tsv.zip', 'wb').write(r.content)

A python script unzips it without error using either version of python I have.

Russ

Russ
Last seen: 03/21/2024 - 09:05
Joined: 11/14/2017 - 22:15
corrupt file

It looks like the zip file is corrupt.  The unzipped file has 1.4M lines while the download page says it should have 7.1M.  The unzipped file should be 5.1G while locally mine is 1.2G, which is smaller than the zip file itself. 

PVTeam
Role: moderator
Last seen: 04/23/2024 - 10:43
Joined: 10/17/2017 - 10:47
Hi all, Thank you for…

Hi all,

Thank you for bringing this to our attention. We have resolved the issue with this file and reposted it to the bulk downloads page. Please let us know if you encounter any further issues.

Best,

PVTeam