Question

我正在尝试使用SodaPy库（https://github.com/xmunoz/sodapy在线提供的消费者投诉数据集（hhttps：//data.consumerfinance.gov/dataset/Consumer-Complaints/s6ew-h6mp）进行API调用）。我只是想获取csv数据，网页上说它有906182行，

我尽可能地遵循GitHub上的示例，但它只是不起作用。这是代码：

from sodapy import Socrata

client = Socrata("data.consumerfinance.gov", "apptoken", username="myusername", password="mypassword")

results = client.get("s6ew-h6mp")

我想获取整个数据集，但我不断收到以下错误：

ReadTimeout: HTTPSConnectionPool(host='data.consumerfinance.gov', port=443): Read timed out. (read timeout=10)

有关如何解决此问题的任何线索？

Answer 1

由于文件太大，连接可能会超时。您可以尝试使用 limit 选项下载数据的子集，例如

results = client.get("s6ew-h6mp", limit=1000)

您还可以使用SoQL keywords查询数据的子集。

否则，sodapy模块构建在requests模块上，因此查看文档可能很有用。

Answer 2

默认情况下，Socrata连接将在10秒后超时。

您可以通过更新'timeout'实例变量来增加Socrata客户端的超时限制，如下所示：

from sodapy import Socrata

client = Socrata("data.consumerfinance.gov", "apptoken", username="myusername", password="mypassword")

# change the timeout variable to an arbitrarily large number of seconds
client.timeout = 50

results = client.get("s6ew-h6mp")

Answer 3

我认为这实际上可以解决问题：确保您从API端点请求数据。 4x4 ID略有不同（查看数据集here时，单击“导出”，然后单击“SODA API”）。尝试：

results = client.get("jhzv-w97w")

Answer 4

查看source code on GitHub，Socrata的构造函数有一个超时参数。下面的代码示例将超时时间从10秒增加到25秒：

from sodapy import Socrata
client = Socrata("data.consumerfinance.gov", "apptoken", timeout=25)
results = client.get("s6ew-h6mp")

使用Sodapy客户端的API数据的Readtimeout错误

4 个答案: