使用python PRAW拉取reddit注释并使用结果创建数据框

时间:2018-01-20 17:07:07

标签: python pandas dataframe reddit praw

我希望从reddit帖子中提取所有评论,并最终将作者姓名,评论和upvotes纳入数据框。我对编程很新,所以我很难过......

现在我正在使用PRAW拉出粘滞的评论并尝试使用for循环来迭代评论并创建一个带有作者和评论的字典列表。出于某种原因,它只是将第一作者评论dictinoary配对添加到列表并重复它。这就是我所拥有的:

import praw
import pandas as pd
import pprint

reddit = praw.Reddit(xxx)
sub = reddit.subreddit('ethtrader')
hot_python = sub.hot(limit=1)



for submissions in hot_python:
    if submission.stickied:
        print('Title: {}, ups: {}, downs: {}'.format(submissions.title, submissions.ups,submissions.downs))
        post = {}
        postlist = []                                                 
        submission.comments.replace_more(limit=0)
        for comment in submission.comments: 
            post['Author'] = comment.author
            post['Comment'] = comment.body
            postlist.append(post)

有什么想法吗?为丑陋的代码道歉我在这里是新手。谢谢!

1 个答案:

答案 0 :(得分:1)

for submissions in hot_python:
    if submission.stickied:
        print('Title: {}, ups: {}, downs: {}'.format(submissions.title, submissions.ups,submissions.downs))
        postlist = []                                                 
        submission.comments.replace_more(limit=0)
        for comment in submission.comments: 
            post = {} # put this here
            post['Author'] = comment.author
            post['Comment'] = comment.body
            postlist.append(post)

你应该在post循环中声明一个新的for dict,因为当你将它追加到列表中时,你实际上是在追加post dict的引用,然后您使用新数据更改相同的dict,并且对该dict的所有引用都会更改。您最后的列表只是对同一个词典的引用列表。