我一直在拼命寻找解决方案,以检索所有评论和相应的答复以进行研究。创建一个包含正确和相应顺序的注释数据的数据框非常困难。
我将在这里分享我的代码,以便您的专业人员查看并为我提供一些见识。
def get_video_comments(service, **kwargs):
comments = []
results = service.commentThreads().list(**kwargs).execute()
while results:
for item in results['items']:
comment = item['snippet']['topLevelComment']['snippet']['textDisplay']
comment2 = item['snippet']['topLevelComment']['snippet']['publishedAt']
comment3 = item['snippet']['topLevelComment']['snippet']['authorDisplayName']
comment4 = item['snippet']['topLevelComment']['snippet']['likeCount']
if 'replies' in item.keys():
for reply in item['replies']['comments']:
rauthor = reply['snippet']['authorDisplayName']
rtext = reply['snippet']['textDisplay']
rtime = reply['snippet']['publishedAt']
rlike = reply['snippet']['likeCount']
data = {'Reply ID': [rauthor], 'Reply Time': [rtime], 'Reply Comments': [rtext], 'Reply Likes': [rlike]}
print(rauthor)
print(rtext)
data = {'Comment':[comment],'Date':[comment2],'ID':[comment3], 'Likes':[comment4]}
result = pd.DataFrame(data)
result.to_csv('youtube.csv', mode='a',header=False)
print(comment)
print(comment2)
print(comment3)
print(comment4)
print('==============================')
comments.append(comment)
# Check if another page exists
if 'nextPageToken' in results:
kwargs['pageToken'] = results['nextPageToken']
results = service.commentThreads().list(**kwargs).execute()
else:
break
return comments
执行此操作时,我的搜寻器会收集评论,但不会收集某些评论下的某些回复。
我如何使它收集评论及其相应的答复,并将它们放在单个数据框中?
因此,我设法以某种方式在Jupyter Notebook的输出部分提取了所需的信息。我现在要做的就是将结果附加到数据框中。
这是我更新的代码:
def get_video_comments(service, **kwargs):
comments = []
results = service.commentThreads().list(**kwargs).execute()
while results:
for item in results['items']:
comment = item['snippet']['topLevelComment']['snippet']['textDisplay']
comment2 = item['snippet']['topLevelComment']['snippet']['publishedAt']
comment3 = item['snippet']['topLevelComment']['snippet']['authorDisplayName']
comment4 = item['snippet']['topLevelComment']['snippet']['likeCount']
if 'replies' in item.keys():
for reply in item['replies']['comments']:
rauthor = reply['snippet']['authorDisplayName']
rtext = reply['snippet']['textDisplay']
rtime = reply['snippet']['publishedAt']
rlike = reply['snippet']['likeCount']
print(rtext)
print(rtime)
print(rauthor)
print('Likes: ', rlike)
print(comment)
print(comment2)
print(comment3)
print("Likes: ", comment4)
print('==============================')
comments.append(comment)
# Check if another page exists
if 'nextPageToken' in results:
kwargs['pageToken'] = results['nextPageToken']
results = service.commentThreads().list(**kwargs).execute()
else:
break
return comments
结果是:
如您所见,在========
行下分组的评论是评论和下面的相应回复。
将结果附加到数据框中的好方法是什么?
答案 0 :(得分:1)
根据官方文档,replies.comments[]
资源的属性CommentThreads
具有以下规范:
replies.comments[](列表)
对顶级评论的一个或多个回复列表。列表中的每个项目都是comment资源。该列表包含有限数量的答复,并且除非列表中的项目数等于
snippet.totalReplyCount
属性的值,否则答复列表仅是可用于答复的总数的一部分。顶级评论。要检索对顶级注释的所有答复,您需要调用Comments.list
方法并使用parentId
请求参数来标识要为其检索答复的注释。
因此,如果要获取与给定顶级评论关联的所有答复条目,则必须使用经过适当查询的Comments.list
API端点。
我建议您阅读my answer to a very much related question;共有三个部分:
nextPageToken
和参数pageToken
,以及首先,您必须承认,当这些评论的数量超过特定(未指定)上限时,API(当前实施)不允许获取与给定视频相关的所有顶级评论绑定。
对于与Python实现有关的问题,我建议您按以下方式构建代码:
def get_video_comments(service, video_id):
request = service.commentThreads().list(
videoId = video_id,
part = 'id,snippet,replies',
maxResults = 50
)
comments = []
while request:
response = request.execute()
for comment in response['items']:
reply_count = comment['snippet'] \
['totalReplyCount']
replies = comment.get('replies')
if replies is not None and \
reply_count != len(replies['comments']):
replies['comments'] = get_comment_replies(
service, comment['id'])
# 'comment' is a 'CommentThreads Resource' that has it's
# 'replies.comments' an array of 'Comments Resource'
# Do fill in the 'comments' data structure
# to be provided by this function:
...
request = service.commentThreads().list_next(
request, response)
return comments
def get_comment_replies(service, comment_id):
request = service.comments().list(
parentId = comment_id,
part = 'id,snippet',
maxResults = 50
)
replies = []
while request:
response = request.execute()
replies.extend(response['items'])
request = service.comments().list_next(
request, response)
return replies
请注意,...
上方的省略号必须替换为实际代码,该代码填充get_video_comments
返回给调用者的结构数组。
最简单的方法(用于快速测试)是将...
替换为comments.append(comment)
,然后将get_video_comments
的调用者简单地打印出来(使用json.dump
)从该函数获得的对象。