我正在尝试导入以下数据集并将其存储在pandas数据框中:https://data.nasa.gov/Space-Science/Meteorite-Landings/gh4g-9sfh/data
我使用以下代码:
r = requests.get('https://data.nasa.gov/resource/gh4g-9sfh.json')
meteor_data = r.json()
df = pd.DataFrame(meteor_data)
print(df.shape)
结果数据框只有1000行。我需要它具有所有45,716行。我该怎么做?
答案 0 :(得分:0)
签出docs on the $limit parameter
$ limit参数控制返回的总行数,以及 每个请求默认为1,000条记录。
注意:$ limit的最大值为50,000条记录,如果您 超过该限制,您将收到400错误的请求响应。
因此,您只是获得了默认数量的记录。
在一个API调用中,您将无法获得50,000条以上的记录-这将使用$ limit和$ offset进行多次调用
尝试:
https://data.nasa.gov/resource/gh4g-9sfh.json$limit=50000
请参见Why am I limited to 1,000 rows on SODA API when I have an App Key
答案 1 :(得分:0)
喜欢此设置上限
import pandas as pd
from sodapy import Socrata
# Unauthenticated client only works with public data sets. Note 'None'
# in place of application token, and no username or password:
client = Socrata("data.nasa.gov", None)
# Example authenticated client (needed for non-public datasets):
# client = Socrata(data.nasa.gov,
# MyAppToken,
# userame="user@example.com",
# password="AFakePassword")
# First 2000 results, returned as JSON from API / converted to Python list of
# dictionaries by sodapy.
results = client.get("gh4g-9sfh", limit=2000)
# Convert to pandas DataFrame
results_df = pd.DataFrame.from_records(results)