Python> Selenium + CSV:如何从.csv文件中的列表打开链接,循环代码,在csv上附加数据?

时间:2019-12-28 08:51:41

标签: python selenium csv web-scraping

我目前正在构建一个抓取器,可以管理代码,但是需要一些编码帮助/以下程序的教程建议:

  

1)要使用网络驱动程序打开的.csv格式的链接列表

     

2)为列表中的所有链接运行相同的抓取代码

     

3)要将输出附加到.csv文件

代码的基本结构:

 from selenium import webdriver
    import time
    import csv
    from selenium.webdriver.common.keys import Keys

    driver = webdriver.Chrome

    #driver.get("...link from csv file..."), e.g. with open('links.csv', 'r') as file:   etc...
    time.sleep(5)

    elements = driver.find_elements_by_class_name('data-xl')
    csvfile = "output.csv";

with open(csvfile, "w", newline="") as output:
    writer = csv.writer(output)
    writer.writerow(["Reads", "Average Time Spent", "Impressions", "Read Time", "Likes", "Publication Shares", "Times Stacked", "Link-Outs"])
    column headers

driver.quit()

问题:

  

1)如何使用Python Selenium打开.csv,并逐行依次访问链接(1,+ 1,+ 1 ...)

     

2)要为步骤(1)中访问的所有链接循环编码,即使出现错误(例如“找不到元素”等),也可以继续执行.csv上的下一项

     

3)在.csv中创建标头(注意:上面的代码结构不准确)

     

4)通过附加方式将输出打印到.csv中,并且没有重叠

关于如何实现上述步骤的任何提示都会有所帮助

1 个答案:

答案 0 :(得分:0)

首先,您必须将driver = webdriver.Chrome修改为driver = webdriver.Chrome()

这里是完整代码。

from selenium import webdriver
import time
import csv

#link.csv below
# https://google.com
# https://google.com
# https://google.com

driver = webdriver.Chrome()

f = open('link.csv', 'r', encoding='utf-8')
reader = csv.reader(f)
w = open('output.csv', 'w', newline="", encoding="utf-8")
writer = csv.writer(w)

for line in reader:
    driver.get(line[0])
    time.sleep(5)
    elements = driver.find_element_by_xpath('//img[@alt="Google"]')
    writer.writerow(
             ["Reads", "Average Time Spent", "Impressions", "Read Time", "Likes", "Publication Shares", "Times Stacked",
              "Link-Outs"])

f.close()
w.close()