我获得了一个包含一些JSON文本的URL。在文本中,有csv文件的URL。我正在尝试从URL解析JSON并下载CSV文件。我可以从URL中打印出JSON,但不知道如何从内部抓取CSV文件。
import urllib, json
import urllib.request
with urllib.request.urlopen("http://staging.test.com/api/reports/68.json?auth_token=test") as url:
s = url.read()
print(s)
上面的代码将从URL打印JSON(请参见下面的打印输出),其中有csv文件的URL,然后需要使用python下载。
{"id":68,"name":"Carrier Rates","state":"complete","user_id":166,"data_set_id":7,"bounding_date":{"id":101,"start_date":"2019-01-01T00:00:00.000-05:00","end_date":"2999-12-31T00:00:00.000-05:00","bounding_field_id":322,"related_id":68,"related_type":"Reports::Report"},"results":[{"id":68,"created_at":"2019-07-26T15:29:40.872-04:00","version_name":"07/26/2019 03:29PM","content":"https://test-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.1dec2e6d-0c36-44b7-ab26-fd43fe710daf.csv"},{"id":67,"created_at":"2019-07-26T15:29:07.112-04:00","version_name":"07/26/2019 03:29PM","content":"https://test-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.3b02195e-c0a2-4abe-88f7-27d20ac76e07.csv"},{"id":35,"created_at":"2019-06-26T11:01:26.900-04:00","version_name":"06/26/2019 11:01AM","content":"https://test-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.a488c58d-5e04-4c28-a429-7167e9e8edaa.csv"},{"id":34,"created_at":"2019-06-26T10:57:51.396-04:00","version_name":"06/26/2019 10:57AM","content":"https://cloudtestlogistics-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.bf73db19-5604-4a1d-bc31-da6cf25742cc.csv"}]}
答案 0 :(得分:0)
import json
from collections import namedtuple
#This is your "s" -- data = s
data = '{"name": "John Smith", "hometown": {"name": "New York", "id": 123}}'
# Parse JSON into an object with attributes corresponding to dict keys.
x = json.loads(data, object_hook=lambda d: namedtuple('X', d.keys())(*d.values()))
print x.name, x.hometown.name, x.hometown.id
此答案来自:How to convert JSON data into a Python object将Json加载到对象中。现在,通过json中传递的密钥来访问它。
print x.content
当然,您必须扭动一下代码才能使其完全按照您想要的方式工作。我不是一个真正的python专家,没有什么可测试的。但想法是将其加载到Tuple对象中,然后通过键访问它。
答案 1 :(得分:0)
import urllib, json
import urllib.request
with urllib.request.urlopen("http://staging.test.com/api/reports/68.json?auth_token=test") as url:
s = url.read()
# assuming here you got that json content
s='{"id":68,"name":"Carrier Rates","state":"complete","user_id":166,"data_set_id":7,"bounding_date":{"id":101,"start_date":"2019-01-01T00:00:00.000-05:00","end_date":"2999-12-31T00:00:00.000-05:00","bounding_field_id":322,"related_id":68,"related_type":"Reports::Report"},"results":[{"id":68,"created_at":"2019-07-26T15:29:40.872-04:00","version_name":"07/26/2019 03:29PM","content":"https://test-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.1dec2e6d-0c36-44b7-ab26-fd43fe710daf.csv"},{"id":67,"created_at":"2019-07-26T15:29:07.112-04:00","version_name":"07/26/2019 03:29PM","content":"https://test-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.3b02195e-c0a2-4abe-88f7-27d20ac76e07.csv"},{"id":35,"created_at":"2019-06-26T11:01:26.900-04:00","version_name":"06/26/2019 11:01AM","content":"https://test-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.a488c58d-5e04-4c28-a429-7167e9e8edaa.csv"},{"id":34,"created_at":"2019-06-26T10:57:51.396-04:00","version_name":"06/26/2019 10:57AM","content":"https://cloudtestlogistics-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.bf73db19-5604-4a1d-bc31-da6cf25742cc.csv"}]}'
d=json.loads(s)
for f in d['results']:
# manage download here
csv_url= f['content']
答案 2 :(得分:0)
以下代码可以为您提供帮助。
import json
import urllib.request
with urllib.request.urlopen("http://staging.test.com/api/reports/68.json?auth_token=test") as url:
s = url.read()
loadJson = json.load(s)
results = loadJson["results"]
csvLinks = []
for object in results:
csvlinks.append(object["content"])
现在,您有了指向CSV文件的链接列表。使用urllib下载它们。