我希望从reddit帖子中提取所有评论,并最终将作者姓名,评论和upvotes纳入数据框。我对编程很新,所以我很难过......
现在我正在使用PRAW拉出粘滞的评论并尝试使用for循环来迭代评论并创建一个带有作者和评论的字典列表。出于某种原因,它只是将第一作者评论dictinoary配对添加到列表并重复它。这就是我所拥有的:
import praw
import pandas as pd
import pprint
reddit = praw.Reddit(xxx)
sub = reddit.subreddit('ethtrader')
hot_python = sub.hot(limit=1)
for submissions in hot_python:
if submission.stickied:
print('Title: {}, ups: {}, downs: {}'.format(submissions.title, submissions.ups,submissions.downs))
post = {}
postlist = []
submission.comments.replace_more(limit=0)
for comment in submission.comments:
post['Author'] = comment.author
post['Comment'] = comment.body
postlist.append(post)
有什么想法吗?为丑陋的代码道歉我在这里是新手。谢谢!
答案 0 :(得分:1)
for submissions in hot_python:
if submission.stickied:
print('Title: {}, ups: {}, downs: {}'.format(submissions.title, submissions.ups,submissions.downs))
postlist = []
submission.comments.replace_more(limit=0)
for comment in submission.comments:
post = {} # put this here
post['Author'] = comment.author
post['Comment'] = comment.body
postlist.append(post)
你应该在post
循环中声明一个新的for
dict,因为当你将它追加到列表中时,你实际上是在追加post
dict的引用,然后您使用新数据更改相同的dict,并且对该dict的所有引用都会更改。您最后的列表只是对同一个词典的引用列表。