我编写了一个脚本,通过一系列单击来找到下载链接,首先在设置齿轮图标上,然后在“导出数据”标签上,最后在单击此处下载数据链接上。
但是,当我单击最后一个链接时,它不会将数据下载到我指定的默认目录中。
**理想情况下,我想直接将数据下载到变量中,但是我什至无法弄清为什么常规下载无法正常工作。
我尝试从下载链接中获取href并使用该URL打开一个新标签,但仍然无法提供
URL = 'https://edap.epa.gov/public/single/?appid=73b2b6a5-70c6-4820-b3fa-186ac094f10d&sheet=1e76b65b-dd6c-41fd-9143-ba44874e1f9d'
DELAY = 10
def init_driver(url):
options = webdriver.chrome.options.Options()
path = '/Users/X/Applications/chromedriver'
options.add_argument("--headless")
options.add_argument("download.default_directory=Users/X/Python/data_scraper/epa_data")
driver = webdriver.Chrome(chrome_options= options, executable_path=path)
driver.implicitly_wait(20)
driver.get(url)
return driver
def find_settings(web_driver):
#find the settings gear
#time.sleep(10)
try:
driver_wait = WebDriverWait(web_driver,10)
ng_scope = driver_wait.until(EC.visibility_of_element_located((By.CLASS_NAME,"ng-scope")))
settings = web_driver.find_element_by_css_selector("span.cl-icon.cl-icon--cogwheel.cl-icon-right-align")
print(settings)
settings.click()
#export_data = web_driver.find_elements_by_css_selector("span.lui-list__text.ng-binding")
#print(web_driver.page_source)
except Exception as e:
print(e)
print(web_driver.page_source)
def get_settings_list(web_driver):
#find the export button and download data
menu_item_list = {}
find_settings(web_driver)
#print(web_driver.page_source)
try:
time.sleep(8)
print("got menu_items")
menu_items = web_driver.find_elements_by_css_selector("span.lui-list__text.ng-binding")
for i in menu_items:
print(i.text)
menu_item_list[i.text] = i
except Exception as e:
print(e)
return menu_item_list
def get_export_data(web_driver):
menu_items = get_settings_list(web_driver)
print(menu_items)
export_data = menu_items['Export data']
export_data.click()
web_driver.execute_script("window.open();")
print(driver.window_handles)
main_window = driver.window_handles[0]
temp_window = driver.window_handles[1]
driver.switch_to_window(main_window)
time.sleep(8)
download_data = driver.find_element_by_xpath("//a[contains(text(), 'Click here to download your data file.')]")
download_href = download_data.get_attribute('href')
print(download_href)
download_data.click()
driver.switch_to_window(temp_window)
driver.get("https://edap.epa.gov"+download_href)
print(driver.page_source)
driver = init_driver(URL)
#get_settings_list(driver)
get_export_data(driver)
我想让这段代码模拟单击设置齿轮图标的手动操作,然后导出数据,然后下载将数据下载到csv中的数据(理想情况下,我想跳过该文件并放入pandas数据框中,但是另一个问题)
答案 0 :(得分:0)
出于安全原因,Chrome浏览器在无头运行时将不允许下载。 Here's指向更多信息和可能的解决方法的链接。
除非您需要使用Chrome,否则Firefox将允许无头下载-尽管进行了一些调整。