当我单击链接时,Python Selenium无法下载数据

时间:2019-06-21 06:08:37

标签: python selenium

我编写了一个脚本,通过一系列单击来找到下载链接,首先在设置齿轮图标上,然后在“导出数据”标签上,最后在单击此处下载数据链接上。

但是,当我单击最后一个链接时,它不会将数据下载到我指定的默认目录中。

**理想情况下,我想直接将数据下载到变量中,但是我什至无法弄清为什么常规下载无法正常工作。

我尝试从下载链接中获取href并使用该URL打开一个新标签,但仍然无法提供

URL = 'https://edap.epa.gov/public/single/?appid=73b2b6a5-70c6-4820-b3fa-186ac094f10d&sheet=1e76b65b-dd6c-41fd-9143-ba44874e1f9d'
DELAY = 10



def init_driver(url):
    options = webdriver.chrome.options.Options()
    path = '/Users/X/Applications/chromedriver'
    options.add_argument("--headless")
    options.add_argument("download.default_directory=Users/X/Python/data_scraper/epa_data")
    driver = webdriver.Chrome(chrome_options= options, executable_path=path)
    driver.implicitly_wait(20)
    driver.get(url)
    return driver



def find_settings(web_driver):
    #find the settings gear
    #time.sleep(10)
    try:
        driver_wait = WebDriverWait(web_driver,10)
        ng_scope = driver_wait.until(EC.visibility_of_element_located((By.CLASS_NAME,"ng-scope")))
        settings = web_driver.find_element_by_css_selector("span.cl-icon.cl-icon--cogwheel.cl-icon-right-align")
        print(settings)
        settings.click()
        #export_data = web_driver.find_elements_by_css_selector("span.lui-list__text.ng-binding")
        #print(web_driver.page_source)



    except Exception as e:
        print(e)
        print(web_driver.page_source)


def get_settings_list(web_driver):
    #find the export button and download data
    menu_item_list = {}

    find_settings(web_driver)
    #print(web_driver.page_source)

    try:
        time.sleep(8)
        print("got menu_items")
        menu_items = web_driver.find_elements_by_css_selector("span.lui-list__text.ng-binding")
        for i in menu_items:
            print(i.text)
            menu_item_list[i.text] = i

    except Exception as e:
        print(e)

    return menu_item_list


def get_export_data(web_driver):
    menu_items = get_settings_list(web_driver)
    print(menu_items)
    export_data = menu_items['Export data']
    export_data.click()

    web_driver.execute_script("window.open();")
    print(driver.window_handles)
    main_window = driver.window_handles[0]
    temp_window = driver.window_handles[1]
    driver.switch_to_window(main_window)


    time.sleep(8)

    download_data = driver.find_element_by_xpath("//a[contains(text(), 'Click here to download your data file.')]")
    download_href = download_data.get_attribute('href')

    print(download_href)
    download_data.click()
    driver.switch_to_window(temp_window)
    driver.get("https://edap.epa.gov"+download_href)
    print(driver.page_source)



driver = init_driver(URL)
#get_settings_list(driver)
get_export_data(driver)

我想让这段代码模拟单击设置齿轮图标的手动操作,然后导出数据,然后下载将数据下载到csv中的数据(理想情况下,我想跳过该文件并放入pandas数据框中,但是另一个问题)

1 个答案:

答案 0 :(得分:0)

出于安全原因,Chrome浏览器在无头运行时将不允许下载。 Here's指向更多信息和可能的解决方法的链接。

除非您需要使用Chrome,否则Firefox将允许无头下载-尽管进行了一些调整。