Question

遇到这个奇怪的问题。不太确定如何解决它！

当我添加

options = webdriver.ChromeOptions()
options.add_argument('headless')

对于我的代码，它会忽略随后的While循环。

这是我的完整代码，直到脚本结尾：（带有已编辑的url）

import csv
from bs4 import BeautifulSoup
from selenium import webdriver

options = webdriver.ChromeOptions()
options.add_argument('headless')



    driver = webdriver.Chrome()
    driver.get("SomeURL")
    button = driver.find_element_by_id('show_more')
    count = 1
    while count > 0:
        button.click()
        count = count + 1
        time.sleep(2)
        if count == 50000:
            break

soup = BeautifulSoup(driver.page_source, 'html.parser')



 img_data = []
 for img_tag in soup.find_all('img'):
     data_dict = dict()
     data_dict['image_name'] = img_tag['alt']
     data_dict['image_url'] = img_tag['src']
     img_data.append(data_dict)

with open('osprey.csv', 'w', newline='') as birddata:
     fieldnames = ['image_name', 'image_url']
     writer = csv.DictWriter(birddata, fieldnames=fieldnames)
     writer.writeheader()
     for data in img_data:
         writer.writerow(data)

上面的代码无头运行，但仅返回30个结果并写入CSV。（不处理while循环）。时间：

options = webdriver.ChromeOptions() options.add_argument('headless')

已被删除，driver = webdriver.Chrome(options=options)被修改为driver = webdriver.Chrome()，该过程可以正常工作，并返回10,000多个结果并写入CSV，但并非没有头，而且加载页面的图像需要很长时间。 / p>

我正在抓取数百万张图像，因此我真的需要这样以确保效率。在保持循环运行的同时保持无头状态的任何技巧都很棒。

TIA！ -干杯！

Answer 1

您应该像下面这样设置

WHERE

refer：

driver = webdriver.Chrome(chrome_options=options)

硒-无头ChromeOptions忽略While循环？

1 个答案: