我正在尝试抓取youtube视频评论及其回复,喜欢的评论,不喜欢的评论,评论计数,回复计数。
首先,我尝试使用基于ID的python硒Google驱动程序来抓取注释及其回复之类的文本数据。
我只能抓取页面上可用的评论,而不能回帖。
回复无法实现。
<target name="A" xsi:type="Mail" from="" to="" subject="" smtpServer="" smtpPort="0" skipCertificateValidation="true">
<layout xsi:type="JsonLayout" includeAllProperties="true">
<attribute name="text" layout="${message}" />
<attribute name="level" layout="${level:upperCase=true}"/>
<attribute name="fileName" layout="${var:fileName}"/>
<attribute name="logGroupName" layout="${var:logGroupName}"/>
<attribute name="logStreamName" layout="${var:logStreamName}"/>
<attribute name="category" layout="${logger}" />
<attribute name="exception" layout="${exception:format=@}" encode="false"/>
</layout>
</target>
<target name="B" xsi:type="Mail" from="" to="" subject="" smtpServer="" smtpPort="0" skipCertificateValidation="true">
<layout xsi:type="JsonLayout" includeAllProperties="true">
<attribute name="text" layout="${message}" />
<attribute name="level" layout="${level:upperCase=true}"/>
<attribute name="fileName" layout="${var:fileName}"/>
<attribute name="logGroupName" layout="${var:logGroupName}"/>
<attribute name="logStreamName" layout="${var:logStreamName}"/>
<attribute name="category" layout="${logger}" />
<attribute name="exception" layout="${exception:format=@}" encode="false"/>
</layout>
</target>
使用上面的代码,我只能抓取注释。如何在python中使用硒来删除这些评论的回复,喜欢,不喜欢,日期。
任何人都可以帮助我建议我哪里出错了。
更新后的代码(空数组)
// Label align for Y-axis
$graph->yaxis->SetLabelAlign('center','bottom');
// Titles
// @aici
$graph->title->Set('Difference');
$graph->title->SetFont(FF_ARIAL, FS_BOLD, 14);
// Create a bar pot
$bplot = new BarPlot($yAxis);
//$bplot->SetFillColor('orange');
foreach ($yAxis as $datayvalue) {
if ($datayvalue < '0') $barcolors[]='yellow';
elseif ($datayvalue >= '0' ) $barcolors[]='blue';
}
$bplot->SetFillColor($barcolors);
$bplot->SetWidth(0.5);
$bplot->SetYMin(100);
$bplot->value->SetFont(FF_ARIAL, FS_NORMAL, 10.5);
$bplot->SetWeight(0);
//$bplot->numpoints = 1;
$graph->Add($bplot);
import time
import csv
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
chrome_path = "/Users/Downloads/chromedriver"
page_url = "https://www.youtube.com/watch?v=AJesAlohO6I&t="
driver = webdriver.Chrome(executable_path=chrome_path)
driver.get(page_url)
time.sleep(2)
title = driver.find_element_by_xpath('//*[@id="container"]/h1/yt-formatted-string').text
print(title)
SCROLL_PAUSE_TIME = 2
CYCLES = 100
html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.PAGE_DOWN)
html.send_keys(Keys.PAGE_DOWN)
time.sleep(SCROLL_PAUSE_TIME * 3)
for i in range(CYCLES):
html.send_keys(Keys.END)
time.sleep(SCROLL_PAUSE_TIME)
comment_elems = driver.find_elements_by_xpath('//*[@id="content-text"]')
all_comments = [elem.text for elem in comment_elems]
print(all_comments)
write_file = "output_testing.csv"
with open(write_file, "w") as output:
for line in all_comments:
output.write(line + '\n')
我的实际输出:
import time
import csv
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
chrome_path = "/Users/Downloads/chromedriver"
page_url = "https://www.youtube.com/watch?v=qBp1rCz_yQU"
driver = webdriver.Chrome(executable_path=chrome_path)
driver.get(page_url)
time.sleep(2)
title = driver.find_element_by_xpath('//*[@id="container"]/h1/yt-formatted-string').text
print(title)
SCROLL_PAUSE_TIME = 2
CYCLES = 100
html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.PAGE_DOWN)
html.send_keys(Keys.PAGE_DOWN)
time.sleep(SCROLL_PAUSE_TIME * 3)
for i in range(CYCLES):
html.send_keys(Keys.END)
time.sleep(SCROLL_PAUSE_TIME)
driver.find_elements_by_xpath('//div[@id="replies"]/ytd-comment-replies-renderer/ytd-expander/paper-button[@id="more"]')
comment_elems = driver.find_elements_by_xpath('//div[@id="loaded-replies"]//yt-formatted-string[@id="content-text"]')
all_comments = [elem.text for elem in comment_elems]
print(all_comments)
write_file = "output_31may.csv"
with open(write_file, "w") as output:
for line in all_comments:
output.write(line + '\n')
我得到答复内容消息的预期输出。但是我只能获取回复计数。
答案 0 :(得分:0)
您需要点击查看重播以抓取评论回复。
点击该按钮,您可以执行以下操作:
driver.find_elements_by_xpath('//div[@id="replies"]/ytd-comment-replies-renderer/ytd-expander/paper-button[@id="more"]').click()
然后是抓取答复
driver.find_elements_by_xpath('//div[@id="loaded-replies"]//yt-formatted-string[@id="content-text"]')