我正在尝试从来自PushShift API的JSON数据中编写CSV文件,但我遇到了TypeError。我的代码在
之下import requests
import csv
import json
from urllib.request import urlopen
url = 'https://api.pushshift.io/reddit/comment/search/?subreddit=science&filter=parent_id,id,author,created_utc,subreddit,body,score,permalink'
page = requests.get(url)
page_json = json.loads(page.text)
print(page.text)
f = csv.writer(open("test.csv",'w+', newline=''))
f.writerow(["id", "parent_id", "author", "created_utc","subreddit", "body", "score"])
for x in page_json:
f.writerow([x["data"]["id"],
x["data"]["parent_id"],
x["data"]["author"],
x["data"]["created_utc"],
x["data"]["subreddit"],
x["data"]["body"],
x["data"]["score"]])
我得到的错误是:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-3-82784a93576b> in <module>()
11 f.writerow(["id", "parent_id", "author", "created_utc","subreddit", "body", "score"])
12 for x in page:
---> 13 f.writerow([x["data"]["id"],
14 x["data"]["parent_id"],
15 x["data"]["author"],
TypeError: byte indices must be integers or slices, not str
我在这里尝试了解决方案:How can I convert JSON to CSV?
我遇到的实际问题可能是也可能不是。任何建议将不胜感激!
答案 0 :(得分:1)
您的“数据”包含csv行的条目数组,而不是每个具有键“data”的对象数组。所以你需要先访问“数据”:
page_json = json.loads(page.text)['data']
然后迭代它:
for x in page_json:
f.writerow([x["id"],
x["parent_id"],
x["author"],
x["created_utc"],
x["subreddit"],
x["body"],
x["score"]])
请注意,您需要遍历JSON对象而不是请求。
您还可以重构代码以获取此信息:
columns = ["id", "parent_id", "author", "created_utc", "subreddit", "body", "score"]
f.writerow(columns)
for x in page_json:
f.writerow([x[column] for column in columns])