我想使用https://api.c3.ai/covid/api/1/linelistrecord/fetch
API,但只能获取2000条记录。我知道有超过2000条记录-如何获取它们?
这是我在R中的代码:
library(tidyverse)
library(httr)
library(jsonlite)
resp <- POST(
"https://api.c3.ai/covid/api/1/linelistrecord/fetch",
body = list(
spec = {}
) %>% toJSON(auto_unbox = TRUE),
accept("application/json")
)
length(content(resp)$objs)
我得到2000条记录。
答案 0 :(得分:4)
您传入的spec
具有以下可选字段,
limit
//返回的最大对象数offset
//用于分页读取的偏移量 limit
的默认值为2000。
返回的获取结果具有一个布尔字段,以及称为hasMore
的对象数组,它指示基础数据存储区中是否还有更多记录。
您可以编写一个hasMore
为假时结束的循环。以offset
为0开始,并限制n
(例如n=2000
),然后将偏移量迭代增加n
。
library(tidyverse)
library(httr)
library(jsonlite)
limit <- 2000
offset <- 0
hasMore <- TRUE
all_objs <- c()
while(hasMore) {
resp <- POST(
"https://api.c3.ai/covid/api/1/linelistrecord/fetch",
body = list(
spec = list(
limit = limit,
offset = offset,
filter = "contains(location, 'California')" # just as an example, to cut down on the dataset
)
) %>% toJSON(auto_unbox = TRUE),
accept("application/json")
)
hasMore <- content(resp)$hasMore
offset <- offset + limit
all_objs <- c(all_objs, content(resp)$objs)
}
length(all_objs)
答案 1 :(得分:1)
您也可以在Python中执行类似的操作。这是在Python中执行相同操作的代码段
import requests
headers = {'Accept': 'application/json'}
import io
import pandas as pd
def read_data(url, payload, headers = headers):
df_list = []
has_more = True
offset = 0
payload['spec']['offset'] = offset
while has_more:
response = requests.post('https://api.c3.ai/covid/api/1/linelistrecord/fetch', json=payload, headers = headers)
df = pd.DataFrame.from_dict(response.json()['objs'])
has_more = response.json()['hasMore']
payload['spec']['offset'] += df.shape[0]
df_list.append(df)
df = pd.concat(df_list)
return df
url = 'https://api.c3.ai/covid/api/1/linelistrecord/fetch'
payload = {
"spec":{
"filter": "exists(hospitalAdmissionDate)",
"include": "caseConfirmationDate, outcomeDate, hospitalAdmissionDate, age"
}
}
df = read_data(url, payload)