Question

我试图将下面链接的json中的所有“author”条目保存到列表中，但是对python来说却是新手。有人可以指出我正确的方向吗？

json：https://codebeautify.org/jsonviewer/cb0d0a91

试图刮掉reddit线程：

import requests
import json

url ="https://www.reddit.com/r/easternshoremd/comments/72u501/going_to_be_in_the_easton_area_for_work_next_week.json"

r = requests.get(url, headers={'User-agent': 'Chrome'})
d = r.json()

scrapedids = []

for child in d['data']['children']:
    scrapedids.append(child['data']['author'])

print (scrapedids)

如果我将网址从reddit帖子切换到subreddit，那么它可以工作。例如，如果我设置

url = ("https://www.reddit.com/r/easternshoremd.json")

我认为问题是我对json的目录/树（无论它叫什么）缺乏理解。我被挂了几个小时，感谢任何帮助。

错误：

追踪（最近一次通话）：文件“/home/usr/PycharmProjects/untitled/delete.py”，第14行，in 对于d ['data'] ['children']中的孩子： TypeError：list indices必须是整数或切片，而不是str

Answer 1

你包含了一个JSON的链接，这很好。它表明根是一个数组。

因此，您的代码应该更像：

import requests
import json

url ="https://www.reddit.com/r/easternshoremd/comments/72u501/going_to_be_in_the_easton_area_for_work_next_week.json"

r = requests.get(url, headers={'User-agent': 'Chrome'})
listings = r.json()

scrapedids = []

for listing in listings:
    for child in listing['data']['children']:
        scrapedids.append(child['data']['author'])

print (scrapedids)

请注意，我将d重命名为listings，与kind属性（'listing'）相关。

如何从reddit中正确地抓取json响应？

1 个答案: