我目前正在构建一个抓取器,可以管理代码,但是需要一些编码帮助/以下程序的教程建议:
1)要使用网络驱动程序打开的.csv格式的链接列表
2)为列表中的所有链接运行相同的抓取代码
3)要将输出附加到.csv文件
代码的基本结构:
from selenium import webdriver
import time
import csv
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome
#driver.get("...link from csv file..."), e.g. with open('links.csv', 'r') as file: etc...
time.sleep(5)
elements = driver.find_elements_by_class_name('data-xl')
csvfile = "output.csv";
with open(csvfile, "w", newline="") as output:
writer = csv.writer(output)
writer.writerow(["Reads", "Average Time Spent", "Impressions", "Read Time", "Likes", "Publication Shares", "Times Stacked", "Link-Outs"])
column headers
driver.quit()
问题:
1)如何使用Python Selenium打开.csv,并逐行依次访问链接(1,+ 1,+ 1 ...)
2)要为步骤(1)中访问的所有链接循环编码,即使出现错误(例如“找不到元素”等),也可以继续执行.csv上的下一项
3)在.csv中创建标头(注意:上面的代码结构不准确)
4)通过附加方式将输出打印到.csv中,并且没有重叠
关于如何实现上述步骤的任何提示都会有所帮助
答案 0 :(得分:0)
首先,您必须将driver = webdriver.Chrome
修改为driver = webdriver.Chrome()
和
这里是完整代码。
from selenium import webdriver
import time
import csv
#link.csv below
# https://google.com
# https://google.com
# https://google.com
driver = webdriver.Chrome()
f = open('link.csv', 'r', encoding='utf-8')
reader = csv.reader(f)
w = open('output.csv', 'w', newline="", encoding="utf-8")
writer = csv.writer(w)
for line in reader:
driver.get(line[0])
time.sleep(5)
elements = driver.find_element_by_xpath('//img[@alt="Google"]')
writer.writerow(
["Reads", "Average Time Spent", "Impressions", "Read Time", "Likes", "Publication Shares", "Times Stacked",
"Link-Outs"])
f.close()
w.close()