所以我ḿ使用reddit api,由于与案例无关的某些原因,我想在这种情况下不使用reddit包装器工作。该代码实际上非常简单,它从subreddit内的特定帖子中提取注释和1级回复。
这是该函数的代码,
def getcommentsforpost(subredditname,postid,):
#here we make the request to reddit, and create a python dictionary
#from the resulting json code
reditpath = '/r/' + subredditname + '/comments/' + postid
redditusual = 'https://www.reddit.com'
parameters = '.json?'
totalpath = redditusual + reditpath + parameters
p = requests.get(totalpath, headers = {'User-agent' : 'Chrome'})
result = p.json()
#we are going to be looping a lot through dictionaries, to extract
# the comments and their replies, thus, a list where we will insert
# them.
totallist = []
# the result object is a list with two dictionaries, one with info
#on the post, and the second one with all the info regarding the
#comments and their respective replies, because of this, we first
# process the posts info located in result[0]
a = result[0]["data"]["children"][0]["data"]
abody = a["selftext"]
aauthor = a["author"]
ascore = a["score"]
adictionary = {"commentauthor" : aauthor , "comment" : abody , "Type" : "Post",
"commentscore" : ascore}
totallist.append(adictionary)
# and now, we start processing the comments, located in result[1]
for i in result[1]["data"]["children"]:
ibody = i["data"]["body"]
iauthor = i["data"]["author"]
iscore = i["data"]["score"]
idictionary = {"commentauthor" : iauthor , "comment" : ibody , "Type" : "post_comment",
"commentscore" : iscore}
totallist.append(idictionary)
# to clarify, until here, the code works perfectly. No problem
# whatsoever, its exactly in the following section where the
#error happens.
# we create a new object, called replylist,
#that contains a list of dictionaries in every interaction of
#the loop.
replylists = i["data"]["replies"]["data"]["children"]
# we are going to loop through them, in every comment we extract
for j in replylists:
jauthor = j["data"]["author"]
jbody = j["data"]["body"]
jscore = j["data"]["score"]
jdictionary = {"commentauthor" : jauthor , "comment" : jbody , "Type" : "comment_reply" ,
"commentscore" : jscore }
totallist.append(jdictionary)
# just like we did with the post info and the normal comments,
# we extract and put it in totallist.
finaldf = pd.DataFrame(totallist)
return(finaldf)
getcommentsforpost("Python","a7zss0")
但是在执行循环答复时,代码失败。它返回此错误“字符串索引必须是整数”,将错误告知变量答复列表,但是,当我像这样在循环外执行代码时
result[1]["data"]["children"][4]["data"]["replies"]["data"]["children"][0]
它工作完美,应该是一样的效果。我相信它会将回复列表视为字符串,而不是列表(这是其类)
我尝试过的事情:
我尝试确保回复列表的类是具有type()函数的列表,它证明要返回“列表”,但是对于循环的仅5次交互,然后它会失败并出现相同的错误。
我尝试使用for ja in range(0,len(replylists))
使列表循环,然后将j
变量创建为replylists[ja]
。它返回了相同的错误。
我已经调试了两个小时,在没有代码片段的情况下,该功能可以正常运行(当然,它不会在最终数据帧中返回答复,但是可以运行)。为什么会这样呢? replylists
是词典列表,而不是字符串,但它给出了奇怪的错误。
以下是我们正在使用的功能的reddit文档: https://www.reddit.com/dev/api#GET_comments_ {文章}
要导入的库: 要求, 熊猫为pd, json
我重复一遍,建议包装器不是解决方案,我想将它与json和rest一起使用。
正在研究: 'Python版本3.6.5 | Anaconda版本5.2.0,Jupyter Notebook 5.5.0'
先谢谢您。希望它会变得有趣,我将从这里继续工作。
答案 0 :(得分:1)
我已经做了一些挖掘,并将您的代码复制到本地环境,并进行了一些调试,主要是这样:
try:
replylists = i["data"]["replies"]["data"]["children"]
except:
for point in i['data']:
print(point)
exit()
通过这种方式,我发现实际上i["data"]
具有值(实际上是其中的57个),而其中的57个值包含replies
,但是我进行了一些浏览,发现答复内容为空:
'replies': ''
是我直接打印出i
破碎值时看到的。
但是,所有的希望并没有失去:您只是忘了忽略那些答复内容为空(''
)的迭代,因为我还进行了一次检查以查看实际上有多少次迭代失败了,有的可行,有的失败(由于前面提到的推理)。
有了这个,我会建议您在出现这种错误时使用try
和except
进行调试(这是一项有用的技能),而且还要针对问题的更多内容进行弄清楚答复内容为空时您想做什么。
祝您一切顺利,希望对您有所帮助。
答案 1 :(得分:0)
这是我的解决方法,创建了一个if语句来检查[“ data”] [“ replies”]是否是字典,在这种情况下执行代码,如果不是,则继续循环。
这就是它的外观,再次感谢Aditya和Goyo:
def getcommentsforpost(subredditname,postid,):
reditpath = '/r/' + subredditname + '/comments/' + postid
redditusual = 'https://www.reddit.com'
parameters = '.json?'
totalpath = redditusual + reditpath + parameters
p = requests.get(totalpath, headers = {'User-agent' : 'Chrome'})
result = p.json()
totallist = []
# the result object is a list with two dictionaries, one with info on the post, and the second one
# with all the info regarding the comments and their respective replies
a = result[0]["data"]["children"][0]["data"]
abody = a["selftext"]
aauthor = a["author"]
ascore = a["score"]
adictionary = {"commentauthor" : aauthor , "comment" : abody , "Type" : "Post",
"commentscore" : ascore}
totallist.append(adictionary)
for i in result[1]["data"]["children"]:
ibody = i["data"]["body"]
iauthor = i["data"]["author"]
iscore = i["data"]["score"]
idictionary = {"commentauthor" : iauthor , "comment" : ibody , "Type" : "post_comment",
"commentscore" : iscore}
totallist.append(idictionary)
if isinstance(i["data"]["replies"],dict) :
replylists = i["data"]["replies"]["data"]["children"]
for j in replylists:
jauthor = j["data"]["author"]
jbody = j["data"]["body"]
jscore = j["data"]["score"]
jdictionary = {"commentauthor" : jauthor , "comment" : jbody , "Type" : "comment_reply" ,
"commentscore" : jscore }
totallist.append(jdictionary)
elif type(i["data"]["replies"]) == 'str':
continue
finaldf = pd.DataFrame(totallist)
return(finaldf)