我收到了一个subreddit的内容。 subreddit是AR
。
我需要获得帖子ID,标题,帖子内容,作者,发布日期,分数,评论和评论ID,然后写入txt文件。
我现在面临的问题是:
(1)我可以将评论和评论ID合并到一个文件中吗?因此,它将是post ID, title, post content, author, post date, score, comments, and comment ID
(2)我得到的selftext
有分隔线,所以在我的output.txt中显示像
blablabla
blablabla
blablabla
例如,[this reddit] [1]有多个分隔线。 我希望所有内容都在一行中,因为数据将被转移到csv / excel中以供将来分析。
我的代码:
import praw, datetime, os
reddit = praw.Reddit('bot1')
subreddit = reddit.subreddit('AR')
for submission in subreddit.top(limit=1):
date = datetime.datetime.utcfromtimestamp(submission.created_utc)
for comment in submission.comments:
print("Comment author: ", comment.author)
print("Comments: ", comment.body)
indexFile_comment = open('path' + 'index_comments.txt', 'a+')
indexFile_comment.write('"' + str(comment.author) + '"' + ', ' + '"' + str(comment.body) + '"' + '\n')
print("Post ID: ", submission.id)
print("Title: ", submission.title)
print("Post Content: ", submission.selftext)
print("User Name: ", submission.author)
print("Post Date: ", date)
print("Point: ", submission.score)
indexFile = open('path' + 'index.txt', 'a+')
indexFile.write('"' + str(submission.id) + '"' + ', ' + '"' + str(submission.title) + '"' + ', ' + '"' + str(submission.selftext) + '"' + ', ' + '"' + str(submission.author) + '"' + ', ' + '"' + str(date) + '"' + ', ' + '"' + str(submission.score) + '"' + '\n')
print ("Successfuly writing in file")
indexFile.close()
答案 0 :(得分:0)
要在一行中提交提交,您可以在代码中实现st.replace("\n"," ")
。变量st
为submission.selftext
的位置。
要获取评论ID,您可以执行comment.id
并在for循环中获取正文comment.body
。
修改强>
在第一行中,我只添加了submission.id
和submission.title
,但您可以以相同的方式添加其余内容。循环将注释添加到同一字符串的末尾。在for循环之后,我用空格字符替换任何新的行字符。您可以将record
写入文本文件,当您转到下一次提交时,将下一个record
附加到文本文件中的新行。
record = str(submission.id) + " " + str(submission.title) + " "
for comment in submission.comments:
record = record + comment.author + " " + comment.body + " "
record.replace("\n", " ")