我有适用于单个URL的脚本,并将其保存在文件中。我想从多个网址中抓取并保存到一个文件中。帮帮我。
“”“ 主脚本,用于刮擦任何Youtube视频的评论。 例: $ python main.py YOUTUBE_VIDEO_URL “”“
从硒导入Webdriver来自selenium.com的常见导入例外 导入系统 导入时间 将熊猫作为pd导入
def scrape(url): “” 从URL提供的Youtube视频中提取评论。 精氨酸: url(str):YouTube视频的URL 筹款: selenium.common.exceptions.NoSuchElementException: 当找不到某些要查找的元素时 “” url =“ https://www.youtube.com/watch?v=9hDe2kbCI4g&list=PLzivuVVbLcnqDasWGJSCg2euVWlpSf4S0&index=3”
# Note: replace argument with absolute path to the driver executable.
#driver = webdriver.Chrome('C:\webdrivers\chromedriver')
driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver")
# Navigates to the URL, maximizes the current window, and
# then suspends execution for (at least) 5 seconds (this
# gives time for the page to load).
driver.get(url)
driver.maximize_window()
time.sleep(5)
try:
# Extract the elements storing the video title and
# comment section.
title = driver.find_element_by_xpath('//*[@id="container"]/h1/yt-formatted-string').text
comment_section = driver.find_element_by_xpath('//*[@id="comments"]')
except exceptions.NoSuchElementException:
# Note: Youtube may have changed their HTML layouts for
# videos, so raise an error for sanity sake in case the
# elements provided cannot be found anymore.
error = "Error: Double check selector OR "
error += "element may not yet be on the screen at the time of the find operation"
print(error)
# Scroll into view the comment section, then allow some time
# for everything to be loaded as necessary.
driver.execute_script("arguments[0].scrollIntoView();", comment_section)
time.sleep(7)
# Scroll all the way down to the bottom in order to get all the
# elements loaded (since Youtube dynamically loads them).
last_height = driver.execute_script("return document.documentElement.scrollHeight")
while True:
# Scroll down 'til "next load".
driver.execute_script("window.scrollTo(0, document.documentElement.scrollHeight);")
# Wait to load everything thus far.
time.sleep(2)
# Calculate new scroll height and compare with last scroll height.
new_height = driver.execute_script("return document.documentElement.scrollHeight")
if new_height == last_height:
break
last_height = new_height
# One last scroll just in case.
driver.execute_script("window.scrollTo(0, document.documentElement.scrollHeight);")
try:
# Extract the elements storing the usernames and comments.
username_elems = driver.find_elements_by_xpath('//*[@id="author-text"]')
comment_elems = driver.find_elements_by_xpath('//*[@id="content-text"]')
except exceptions.NoSuchElementException:
error = "Error: Double check selector OR "
error += "element may not yet be on the screen at the time of the find operation"
print(error)
print("> VIDEO TITLE: " + title + "\n")
#print("> USERNAMES & COMMENTS:")
'''for username, comment in zip(username_elems, comment_elems):
print(username.text + ":")
print(comment.text + "\n")'''
df = pd.DataFrame(columns=["Text","Comment"])
for username, comment in zip(username_elems, comment_elems):
df = df.append({"Text":username.text,"Comment":comment.text},ignore_index=True)
print("> SAVING THE DATA TO CSV FILE:\n")
filename="10-G-3.csv"
df.to_csv(filename) #to save into exisiting csv
print("> SAVE SUCCESSFULLY: " + filename + "\n" )
driver.close()
如果名称 ==“ 主要”: scrape(sys.argv [1])