Skip to main content
 
 
 
IN THIS SECTION
12 posts
nateapathy
Last seen: 12/03/2024 - 16:19
Joined: 01/23/2023 - 15:04
504 errors when querying large patent classes

I'm collecting patenting data for a study and am having new issues with the legacy API that I'm wondering if someone knows how to solve. I created a package (https://github.com/nateapathy/patentsview2) to easily query the API via CPC Subgroup ID (which is the level of observation for my study). When I search for relatively small CPC Subgroup IDs (e.g., A21C), I don't have any issues. But with larger ones (e.g., A01D) I get a 504 Gateway Timeout error, even though I've tried to make the pagination more manageable (1000 results per page).

Any insights on how to get this to work for larger CPC Subgroup IDs would be much appreciated!

To recreate this yourself locally, here's the R code

devtools::install_github("nateapathy/patentsview2")
library(patentsview2)

patents_view(cpc="A21C",from="2000-01-01") # should return the data frame without issue
patents_view(cpc="A01D",from="2000-01-01") # throws 504 gateway timeout error
Russ
Last seen: 12/04/2024 - 17:06
Joined: 11/14/2017 - 22:15
try again

Hi Nate,

I initially got a 504 on your second query using the legacy patentsview package from CRAN (disclosure: I'm a contributor).  It worked in Swagger UI when I posted this to the patents endpoint

{
 "q": {"_and":[{"_gte":{"patent_date":"2000-01-01"}},{"_eq":{"cpc_group_id":"A01D" }}]},
 "o": {"per_page":10000}
}

On a retry it was successful using the R package.

library(patentsview)

fields <- c("patent_number","patent_date","patent_title","cpc_subgroup_id")

second_query <- with_qfuns(
  and( gte(patent_date = "2000-01-01"),
    eq(cpc_group_id="A01D")
  )
)

search_pv(second_query, fields = fields, all_pages = TRUE)

#> $data
#> #### A list with a single data frame (with list column(s) inside) on a patent level:
#> 
#> List of 1
#>  $ patents:'data.frame': 10125 obs. of  4 variables:
#>   ..$ patent_number: chr [1:10125] "10004176" ...
#>   ..$ patent_date  : chr [1:10125] "2018-06-26" ...
#>   ..$ patent_title : chr [1:10125] "Weed seed destruction" ...
#>   ..$ cpcs         :List of 10125
#> 
#> $query_results
#> #### Distinct entity counts across all downloadable pages of output:
#> 
#> total_patent_count = 10,125
# Created on 2024-11-22 with reprex v2.1.1

I'll have to check out your repo to see what I can steal!
Russ Allen
 

nateapathy
Last seen: 12/03/2024 - 16:19
Joined: 01/23/2023 - 15:04
search_pv() error

Thanks Russ! 

I'm having trouble recreating your query using the patentsview package - it is throwing an error for me that seems to be related to the "_and" operator and the apply_checks() function within the check_query.R file.

And really my patentsview2 is just a wrapper on patentsview that is purpose-built for this specific project downloading data by cpc_group_id :-) really appreciate all of the development work y'all have done on that, it's a great resource! I suppose I should probably update my package to use the new API, huh?

thanks for your help on this!

> fields <- c("patent_number","patent_date","patent_title","cpc_subgroup_id")
> 
> second_query <- with_qfuns(
+   and(
+     gte(patent_date="2000-01-01"),
+     eq(cpc_group_id="A01D"))
+   )
> 
> search_pv(second_query, fields = fields, all_pages = TRUE)

Error in names(x) %in% c("_not", "_and", "_or") || is.na(names(x)) : 
  'length = 2' in coercion to 'logical(1)'
nateapathy
Last seen: 12/03/2024 - 16:19
Joined: 01/23/2023 - 15:04
update

I updated my patents_view() package in a couple of ways, mainly to change the pagination to 1000 results per page instead of 5000. This seems to help with the A01D error but now when I try to loop through more, I get a gateway timeout on B05C. when I do it outside the context of the loop it errors out as well, so it doesn't seem to be a problem with the loop. It also can't be an issue just on "large" CPCs, since G16H works reliably and returns >33k patents.

devtools::install.github("nateapathy/patentsview2",force=T)
library(patentsview2)

control_cpcs <- c("A01B", "A01D", "A01F", "A21C", "A21D", "A23C", "A23G", "A24D", "A44B", "B02C", "B05C")
# the first 11 of these

control_cpc_dat <- list()

for (i in c(1:length(control_cpcs))) {
  control_cpc_dat[[i]] <- patentsview2::patents_view(cpc=control_cpcs[i],from="2000-01-01")
}

# this will run for the first 10, but then give another gateway error at control_cpcs[11], which is B05C
# now control_cpc_dat is 10 elements long (10 data frames for the first 10 CPCs it got through)
Russ
Last seen: 12/04/2024 - 17:06
Joined: 11/14/2017 - 22:15
not trivial to update

Wow, I'm seeing that too.  It's "just" a warning, the returned data is there if you assign the output.  I didn't see the warning when I ran the reprex.

result <- search_pv(second_query, fields = fields, all_pages = TRUE)

You would eventually need to switch to the new version of the API.  The current shutdown date for the original version is February 12, 2025.  Things you'll need to do before then:

  • request an API key if you won't have one yet
  • update your query, the CPC attributes etc. have new names, some are now nested  
  • update the fields you request, what is available from each endpoint has changed.  You may have to make multiple calls and joins to get the same data the original version returns in one call.
  • the API imposes throttling of 45 requests per minute, super simple  sleep/retry if you use httr2

My fork has the beta-ish version of the new R package.  There's a csv in /data-raw that shows what's available from each endpoint, the API team also has a Swagger UI page.  The problem with the new version is data, currently some groups, like the application data, seem to be sparsely populated.  I tried anding in gte(application.filing_date = "2000-01-01") in your query and saw much different results, just 5,196 rows were returned.  I'll open it as an API bug.  I think you can get by without checking filing date, IIRC the app data is only available from 2001.

Apologies on the size of the reprex below but I couldn't think of a better way to show the new fields other than by requesting them all.  Let me know if you take on upgrading.  I have a vignette on upgrading but didn't have an actual use case.

Russ

# Using the new version of the R package
# with environmetal variable PATENTSVIEW_API_KEY set

library(patentsview)

updated_query <- with_qfuns(
  and(
    eq("assignees.assignee_country" = "US"),
    eq(cpc_current.cpc_subclass_id = "A01D")
  )
)

updated_query 
#> {"_and":[{"_eq":{"assignees.assignee_country":"US"}},{"_eq":{"cpc_current.cpc_subclass_id":"A01D"}}]}

groups <- c("", "assignees","attorneys","botanic", "cpc_current",
  "examiners", "figures", "foreign_priority", "granted_pregrant_crosswalk",
  "inventors", "ipcr", "us_term_of_grant", "wipo"
)

# advertised groups that don't seem to be fully populated
# get 'numbers of columns of arguments do not match' on rbind of paged results
bad_groups <- c("applicants", "application", "cpc_at_issue",  
  "gov_interest_contract_award_numbers", "gov_interest_organizations",
  "pct_data", "us_related_documents","uspc_at_issue"
)

result <- search_pv(updated_query, fields = get_fields("patent", groups = groups), 
  method = "POST", all_pages = TRUE)

result$query_results
#> #### Distinct entity counts across all downloadable pages of output:
#> 
#> total_hits = 8,023, count = 8,023

# you can then do joins on patent_id in the unnested object to get the nested fields
# into your data frame
unnest_pv_data(result$data)
#> List of 12
#>  $ assignees                 :'data.frame':  8116 obs. of  10 variables:
#>   ..$ patent_id                     : chr [1:8116] "3931823" ...
#>   ..$ assignee                      : chr [1:8116] "https://search.patentsvie"..
#>   ..$ assignee_type                 : chr [1:8116] "2" ...
#>   ..$ assignee_individual_name_first: chr [1:8116] NA ...
#>   ..$ assignee_individual_name_last : chr [1:8116] NA ...
#>   ..$ assignee_organization         : chr [1:8116] "Chisholm-Ryder Company, I"..
#>   ..$ assignee_city                 : chr [1:8116] "Niagara Falls" ...
#>   ..$ assignee_state                : chr [1:8116] "NY" ...
#>   ..$ assignee_country              : chr [1:8116] "US" ...
#>   ..$ assignee_sequence             : int [1:8116] 0 0 ...
#>  $ attorneys                 :'data.frame':  9484 obs. of  6 variables:
#>   ..$ patent_id            : chr [1:9484] "3931823" ...
#>   ..$ attorney_id          : chr [1:9484] "d4e95e6106ee82924f6f8ab461d48620" ...
#>   ..$ attorney_sequence    : int [1:9484] 0 0 ...
#>   ..$ attorney_name_first  : chr [1:9484] "Joseph P." ...
#>   ..$ attorney_name_last   : chr [1:9484] "Gastel" ...
#>   ..$ attorney_organization: chr [1:9484] NA ...
#>  $ cpc_current               :'data.frame':  38180 obs. of  8 variables:
#>   ..$ patent_id      : chr [1:38180] "3931823" ...
#>   ..$ cpc_sequence   : int [1:38180] 0 0 ...
#>   ..$ cpc_class      : chr [1:38180] "https://search.patentsview.org/api/v1/c"..
#>   ..$ cpc_class_id   : chr [1:38180] "A01" ...
#>   ..$ cpc_subclass   : chr [1:38180] "https://search.patentsview.org/api/v1/c"..
#>   ..$ cpc_subclass_id: chr [1:38180] "A01D" ...
#>   ..$ cpc_group      : chr [1:38180] "https://search.patentsview.org/api/v1/c"..
#>   ..$ cpc_group_id   : chr [1:38180] "A01D46/00" ...
#>  $ examiners                 :'data.frame':  10521 obs. of  6 variables:
#>   ..$ patent_id          : chr [1:10521] "3931823" ...
#>   ..$ examiner_id        : chr [1:10521] "8hwbs3tyhp3033gw5wjz62edn" ...
#>   ..$ examiner_first_name: chr [1:10521] "Evon C." ...
#>   ..$ examiner_last_name : chr [1:10521] "Blunk" ...
#>   ..$ examiner_role      : chr [1:10521] "primary" ...
#>   ..$ art_group          : chr [1:10521] NA ...
#>  $ figures                   :'data.frame':  8017 obs. of  3 variables:
#>   ..$ patent_id  : chr [1:8017] "3931823" ...
#>   ..$ num_figures: int [1:8017] 5 6 ...
#>   ..$ num_sheets : int [1:8017] 3 6 ...
#>  $ granted_pregrant_crosswalk:'data.frame':  8029 obs. of  4 variables:
#>   ..$ patent_id             : chr [1:8029] "3931823" ...
#>   ..$ document_number       : chr [1:8029] NA ...
#>   ..$ pgpubs_document_number: chr [1:8029] NA ...
#>   ..$ application_number    : chr [1:8029] "05324545" ...
#>  $ inventors                 :'data.frame':  19181 obs. of  8 variables:
#>   ..$ patent_id          : chr [1:19181] "3931823" ...
#>   ..$ inventor           : chr [1:19181] "https://search.patentsview.org/api/"..
#>   ..$ inventor_name_first: chr [1:19181] "Charles G." ...
#>   ..$ inventor_name_last : chr [1:19181] "Burton" ...
#>   ..$ inventor_city      : chr [1:19181] "Lewiston" ...
#>   ..$ inventor_state     : chr [1:19181] "NY" ...
#>   ..$ inventor_country   : chr [1:19181] "US" ...
#>   ..$ inventor_sequence  : int [1:19181] 0 1 ...
#>  $ ipcr                      :'data.frame':  22114 obs. of  11 variables:
#>   ..$ patent_id                     : chr [1:22114] "3931823" ...
#>   ..$ ipc_sequence                  : int [1:22114] 0 0 ...
#>   ..$ ipc_action_date               : chr [1:22114] NA ...
#>   ..$ ipc_section                   : chr [1:22114] "A" ...
#>   ..$ ipc_class                     : chr [1:22114] "01" ...
#>   ..$ ipc_subclass                  : chr [1:22114] "D" ...
#>   ..$ ipc_main_group                : chr [1:22114] "46" ...
#>   ..$ ipc_subgroup                  : chr [1:22114] "00" ...
#>   ..$ ipc_symbol_position           : chr [1:22114] NA ...
#>   ..$ ipc_classification_data_source: chr [1:22114] NA ...
#>   ..$ ipc_classification_value      : chr [1:22114] NA ...
#>  $ wipo                      :'data.frame':  11867 obs. of  3 variables:
#>   ..$ patent_id    : chr [1:11867] "3931823" ...
#>   ..$ wipo_field_id: chr [1:11867] "29" ...
#>   ..$ wipo_sequence: int [1:11867] 0 0 ...
#>  $ foreign_priority          :'data.frame':  723 obs. of  6 variables:
#>   ..$ patent_id              : chr [1:723] "3938684" ...
#>   ..$ priority_claim_sequence: int [1:723] 0 1 ...
#>   ..$ priority_claim_kind    : chr [1:723] NA ...
#>   ..$ foreign_application_id : chr [1:723] "2400200" ...
#>   ..$ filing_date            : chr [1:723] "1974-01-03" ...
#>   ..$ foreign_country_filed  : chr [1:723] "DT" ...
#>  $ us_term_of_grant          :'data.frame':  3373 obs. of  5 variables:
#>   ..$ patent_id      : chr [1:3373] "3973377" ...
#>   ..$ term_grant     : chr [1:3373] NA ...
#>   ..$ term_extension : chr [1:3373] NA ...
#>   ..$ term_disclaimer: chr [1:3373] NA ...
#>   ..$ disclaimer_date: chr [1:3373] "1990-10-30" ...
#>  $ patents                   :'data.frame':  8023 obs. of  19 variables:
#>   ..$ patent_id                                                   : chr [1:80"..
#>   ..$ patent_title                                                : chr [1:80"..
#>   ..$ patent_type                                                 : chr [1:80"..
#>   ..$ patent_date                                                 : chr [1:80"..
#>   ..$ patent_year                                                 : int [1:802..
#>   ..$ patent_abstract                                             : chr [1:80"..
#>   ..$ patent_cpc_current_group_average_patent_processing_days     : int [1:802..
#>   ..$ patent_detail_desc_length                                   : int [1:802..
#>   ..$ patent_earliest_application_date                            : chr [1:80"..
#>   ..$ patent_num_foreign_documents_cited                          : int [1:802..
#>   ..$ patent_num_times_cited_by_us_patents                        : int [1:802..
#>   ..$ patent_num_total_documents_cited                            : int [1:802..
#>   ..$ patent_num_us_applications_cited                            : int [1:802..
#>   ..$ patent_num_us_patents_cited                                 : int [1:802..
#>   ..$ patent_processing_days                                      : int [1:802..
#>   ..$ patent_term_extension                                       : int [1:802..
#>   ..$ gov_interest_statement                                      : chr [1:802..
#>   ..$ patent_uspc_current_mainclass_average_patent_processing_days: int [1:802..
#>   ..$ wipo_kind                                                   : chr [1:80"..

Created on 2024-11-27 with reprex v2.1.1

 

nateapathy
Last seen: 12/03/2024 - 16:19
Joined: 01/23/2023 - 15:04
thanks

this is super helpful to see, I really appreciate it! I'll probably end up doing this update in January sometime :-) I do have an API Key for the new API, but for my use case I only have to download the data once and then I'm done. So the API is serving more as a static data collection tool vs. any type of use case with ongoing API calls. But good to know that I need to figure this out before February 12!

Russ
Last seen: 12/04/2024 - 17:06
Joined: 11/14/2017 - 22:15
wondering

I wonder if you're accidently going against an unindexed field.  I got 504ed on a 

{"_begins": {"cpc_subgroup_id":"B05C"}}

but building it myself, hoping the higher order fields are indexed, seemed to work (though YMMV)!

 {"_and":[{"cpc_section_id": "B"}, {"cpc_subsection_id": "B05"},{"cpc_group_id":"B05C"}]}
nateapathy
Last seen: 12/03/2024 - 16:19
Joined: 01/23/2023 - 15:04
updating pv_post()

I'm trying to update my pv_post() function to do this anding of the three fields instead of just looking at cpc_group_id:X##Y, but I'm not getting the logic right, I don't think. I can troubleshoot this some more but this is the "q" part of the httr post construction in pv_post(). the env$cpc is defined in patents_view() in the parent.frame() so pv_post() inherits that object. But when I test this out I get errors that the names attribute is too long (3 instead of 1).

Russ
Last seen: 12/04/2024 - 17:06
Joined: 11/14/2017 - 22:15
try this

I used the MSMD method (monkey see, monkey do- no offense intended!) to turn your single CPC in the _and into the three lists of lists:

list(
  list(
    cpc_section_id=substr(env$cpc, 1, 1)
  )
),
list(  
  list(
    cpc_subsection_id=substr(env$cpc, 1, 3) 
  )
),
list( # CPC category
  list(
    cpc_group_id=env$cpc #
  )
),

It then seemed to work for me (not that I don't want you to use the R package!):

library(patentsview2)


httr::with_verbose({
   patentsview2::patents_view(cpc="B05C", from="2000-01-01")
})
#> # A tibble: 3,331 × 29
#>    patent_id patent_nu…¹ paten…² paten…³ paten…⁴ paten…⁵ paten…⁶ paten…⁷ paten…⁸
#>    <chr>     <chr>       <chr>   <chr>   <chr>     <int> <chr>   <chr>     <dbl>
#>  1 10004647  10004647    Appara… An app… 2018-0…    2018 Euskir… <NA>       50.7
#>  2 10005094  10005094    Appara… An imp… 2018-0…    2018 Whitev… NC         34.3
#>  3 10005097  10005097    Die fo… A cent… 2018-0…    2018 Stillw… MN         45.1
#>  4 10005925  10005925    Articl… Articl… 2018-0…    2018 Burnsv… MN         44.8
#>  5 10010509  10010509    Appara… An app… 2018-0…    2018 Sunnyv… CA         37.4
#>  6 10010900  10010900    Automa… A mate… 2018-0…    2018 East G… RI         41.7
#>  7 10010904  10010904    Applic… An exe… 2018-0…    2018 Big Ra… MI         43.7
#>  8 10011399  10011399    Fabric… A herm… 2018-0…    2018 Sunnyv… TX         32.8
#>  9 10011399  10011399    Fabric… A herm… 2018-0…    2018 Sunnyv… TX         32.8
#> 10 10016777  10016777    Method… Aeroso… 2018-0…    2018 Menlo … CA         37.5
#> # … with 3,321 more rows, 20 more variables:
#> #   patent_firstnamed_inventor_longitude <dbl>,
#> #   patent_num_cited_by_us_patents <int>, patent_num_combined_citations <int>,
#> #   patent_processing_time <int>, patent_type <chr>,
#> #   patent_num_us_patent_citations <int>, patent_firstnamed_assignee_id <chr>,
#> #   patent_firstnamed_assignee_city <chr>,
#> #   patent_firstnamed_assignee_state <chr>, …

Created on 2024-11-27 with reprex v2.1.1

Way more fun than what I should be doing!  Try the with_verbose our for yourself, reprex isn't catching the output showing this as the q: parameter:

"q":{"_and":[{"_gte":{"app_date":"2000-01-01"}},{"cpc_section_id":"B"},{"cpc_subsection_id":"B05"},{"cpc_group_id":"B05C"},{"assignee_lastknown_country":"US"}]}

 

nateapathy
Last seen: 12/03/2024 - 16:19
Joined: 01/23/2023 - 15:04
none taken!

No offense taken at all, I appreciate your help with this!

I'm getting these weird "2" and "3" prefixes in my query that don't appear in your logging. I've copied in here the entirety of the q argument building via lists in the pv_post() function script.

  request <- httr::POST(url="https://api.patentsview.org/patents/query",
                        body=list(
                          q=list(
                            "_and"=c(
                              list( 
                                list( 
                                  "_gte"=list(
                                    app_date=start_date
                                  )
                                )
                              ),
                              list(
                                list(
                                  "_and"=list(
                                    list(
                                      cpc_section_id=substr(env$cpc, 1, 1)
                                    )
                                  ),
                                  list(
                                    list(
                                      cpc_subsection_id=substr(env$cpc, 1, 3)
                                    )
                                  ),
                                  list( # CPC category
                                    list(
                                      cpc_group_id=env$cpc #
                                    )
                                  )
                                )
                              ),
                              list(
                                list(
                                  assignee_lastknown_country="US" # only from US applicants
                                )
                              )
                            )
                          ),
     ... continues with f= argument

Any idea why this q= argument construction is generating these numbers?

{"q":{"_and":[{"_gte":{"app_date":"2000-01-01"}},{"_and":[{"cpc_section_id":"B"}],"2":[{"cpc_subsection_id":"B05"}],"3":[{"cpc_group_id":"B05C"}]},{"assignee_lastknown_country":"US"}]}

 

Russ
Last seen: 12/04/2024 - 17:06
Joined: 11/14/2017 - 22:15
Correction and outcome

For the record (since I referenced this thread in the API bug I opened), I was wrong above where I thought that the application data only went back to 2001 (true for the application xml data the USPTO produces, but not relevant here).  It seems the application group is always populated from the grant xml and shouldn't be included in bad_groups.   For the reprex, this is in the unnested output when  "application" is requested as a group:

#>  $ application               :'data.frame':  8020 obs. of  7 variables:
#>   ..$ patent_id       : chr [1:8020] "3931823" ...
#>   ..$ application_id  : chr [1:8020] "05/324545" ...
#>   ..$ application_type: chr [1:8020] "05" ...
#>   ..$ filing_date     : chr [1:8020] "1973-01-17" ...
#>   ..$ series_code     : chr [1:8020] "05" ...
#>   ..$ rule_47_flag    : logi [1:8020] FALSE ...
#>   ..$ filing_type     : chr [1:8020] "05" ...

Also, not to leave anyone hanging, Nate corrected his script and completed a portion of his study.  I'm expecting nothing less than co-authorship :-)