Skip to main content
 
 
 
IN THIS SECTION
7 posts
bxb888
Last seen: 01/27/2025 - 13:07
Joined: 01/27/2025 - 00:27
Pagination Not Working
assignee_type = [2,3]
fields = [
    "patent_id",
    "patent_type",
    "application.filing_date",
    "assignees.assignee_organization",
    "assignees.assignee_type",
    "assignees.assignee_country",
    "assignees.assignee_sequence",
    "assignees.assignee_id",
    "ipcr.ipc_sequence",
    "ipcr.ipc_section",
    "patent_num_us_patents_cited",
    "inventors.inventor_country",
    "inventors.inventor_id",
]

query = {
    "_and": [
        {"patent_type":"utility"},
        {"assignees.assignee_type": assignee_type},
        {"_gte": {"application.filing_date": "1989-01-01"}},
        {"_lte": {"application.filing_date": "2023-12-31"}}
    ]
}

field_list = json.dumps(fields)
sort_param = json.dumps([{"patent_id": "asc"}])

#Initial URL
url = f"https://search.patentsview.org/api/v1/patent/?q={json.dumps(query)}&f={field_list}&s={sort_param}&o={json.dumps({"size": 1000})}"

REQUEST_LIMIT = 45
REQUEST_INTERVAL = 60
requests_made = 0
last_request_time = 0

def fetch_patent_data(url, api_key):
    global requests_made, last_request_time

    current_time = time.time()
    time_since_last_request = current_time - last_request_time

    if requests_made >= REQUEST_LIMIT and time_since_last_request < REQUEST_INTERVAL:
        sleep_time = REQUEST_INTERVAL - time_since_last_request
        print(f"Rate limit reached. Sleeping for {sleep_time:.2f} seconds...")
        time.sleep(sleep_time)
        requests_made = 0  # Reset counter after sleep

    headers = {"X-Api-Key": api_key}
    response = requests.get(url, headers=headers)
    requests_made += 1
    last_request_time = time.time()


    if response.status_code == 200:
        data = response.json()
        return data["patents"]
    else:
        # Error handling (same as before)
        status_reason = response.headers.get("X-Status-Reason")
        status_reason_code = response.headers.get("X-Status-Reason-Code")
        print(f"Error fetching data:")
        print(f"  Status Code: {response.status_code}")
        print(f"  X-Status-Reason: {status_reason}")
        print(f"  X-Status-Reason-Code: {status_reason_code}")
        print(f"  Response Text: {response.text}")
        return []

all_patent_data = []
iter = 0
while True:
    patent_data = fetch_patent_data(url, API_KEY)
    if not patent_data:
        break
    all_patent_data.extend(patent_data)
    iter += 1
    print(iter)
    if len(patent_data) < 1000: #Check if less than 1000 results were returned which indicates end of pagination
        print(len(all_patent_data))
        break

    # Prepare the URL for the next page using the last patent_id
    last_patent_id = patent_data[-1]["patent_id"]
    print(last_patent_id)
    url = f"https://search.patentsview.org/api/v1/patent/?q={json.dumps(query)}&f={field_list}&s={sort_param}&o={json.dumps({"after": last_patent_id, "size": 1000})}"

print("Patent data downloaded. Convert to csv file next")
bxb888
Last seen: 01/27/2025 - 13:07
Joined: 01/27/2025 - 00:27
Hello everyone, I&#039;m trying…

Hello everyone, I'm trying to retrieve some variables to facilitate our research. My code only returns information for only 1000 patents, which is one page, indicating a potential problem with pagination. I have not been able to find the problem, so I hope everybody here may be able to help me through. In last September, we did a retrieval with the same code (except for changing the fields we wanted). It worked last time, but ceases to work this time.

Russ
Last seen: 01/27/2025 - 15:05
Joined: 11/14/2017 - 22:15
patent_id needs padding

Hey bxb888,

I had the same problem but in R!  A change around November makes you pad the patent_id in the `after` parameter and nowhere else.  I'm far from pythonic but here's my solution:

def zero_pad(patent_id):
    return re.sub(
        pattern=r'^(0+)([A-Z]+)(\d+)', 
        repl='\\2\\1\\3', 
        string=patent_id.zfill(8)
    )
    
# usage in my code:
if primary_key == "patent_id":
    after = zero_pad(after)

Utility patents patent_id's get leading zeroes when necessary and non utility patents get the numeric portion padded, ex RE036479.  

On the plus side, the API now sorts more naturally by patent_id, ids below 10 million don't come after ones above 10 million for example  (a numeric like search instead of an alpha sort).

You can just send in requests and sleep/retry when you get throttled if you want.  The patentsview team has a python wrapper for the original version of the API which also produces csv files.  I haven't committed my changes for the new version of the API but here's the code it uses:

    r = requests.post(url, headers=headers, json=params)

    # sleep then retry on a 429 Too many requests
    if 429 == r.status_code:
        print("Throttled response from the api, retrying in {} seconds".format(r.headers["Retry-After"]))
        time.sleep(int(r.headers["Retry-After"]))  # Number of seconds to wait before sending next request
        r = requests.post(url, headers=headers, json=params)

I hope this helps
Russ Allen

bxb888
Last seen: 01/27/2025 - 13:07
Joined: 01/27/2025 - 00:27
THANK YOU NOTE

Dear Russ,

I'm truly thankful for you to inform me with the padding, which completely solved my problem. Meanwhile, I hope to ask if there's any good ways to stay ahead of these changes, since it could be really helpful to keep track of changes to our database given its usefulness to our research.

Many Thanks,

Steve

Russ
Last seen: 01/27/2025 - 15:05
Joined: 11/14/2017 - 22:15
no good way

Hi Steve,

What I do is make head requests from time to time on the json object the API team's Swagger UI page is based on.  The timestamp seems to be updated when there's an API release (maybe a three or four week release cycle?)

curl -I https://search.patentsview.org/static/openapi.json

When I notice a change in Last-Modified: I run the test cases in the R package for the API (I'm a contributor).  If a test case breaks I try to figure out what changed or open an API bug if I can't figure it out.  If my python was better I'd try to port the R package!

Russ

bxb888
Last seen: 01/27/2025 - 13:07
Joined: 01/27/2025 - 00:27
VERY HELPFUL!

Dear Russ,

Thank you for pointing out the timestamp check. It's definitely a good way to learn about minor version updates and maintenance changes on the API of the database. This is really helpful, so thanks again for letting me know.

Best,

Steve

PVTeam
Role: moderator
Last seen: 01/27/2025 - 15:45
Joined: 10/17/2017 - 10:47
API ID padding and update notifications.

Hi Steve

I'm glad you were able to find the resolution to your pagination problem!

As Russ described, in our 2.2.0 update, we modified the sorting behavior to operate on a zero-padded version of the patent_id with the goal of producing a more intuitive ordering of patents with IDs greater than and less than 10,000,000. 
We left the user-facing representation of the patent_ids un-padded for consistency with our other data resources, but we recognize that the resulting disconnect has produced less clarity and more frustration with pagination. Based on your feedback and others', we are working on resolving this disconnect in our next version release by allowing users to select whether to use the zero-padded or un-padded representation of patent_id consistently between the query parameters and results.

To keep on top of updates and releases, you can subscribe to our newlsetter and/or keep an eye on our API release notes page

Thank you for using PatentsView and for your feedback.
Best,
PVTeam