单击带有Python和Selenium的Firefox Webdriver的下载链接

时间:2018-10-03 19:11:26

标签: python-3.x selenium selenium-webdriver

在将光标悬停在页面上后,我试图从页面上显示的链接中连续下载历史股票数据。目前,我有以下代码,似乎找不到css_selector,也没有下载.csv文件。

#!/usr/bin/env python3.6

## Import Libraries
import os, sys
import time

from selenium import webdriver
import selenium.webdriver.firefox.options
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC 

## Declare Variables
ticker = 'CAT'
period1 = '1262332800'
period2 = '1537945200'
download_path = os.getcwd()
css_selector = "a.Fl\(end\):nth-child(1)"

## Configure Firefox Options
profile = webdriver.FirefoxProfile()
profile.set_preference("browser.download.folderList", 2) # 0 means to download to the desktop, 1 means to download to the default "Downloads" directory, 2 means to use the directory 
profile.set_preference("browser.download.dir", download_path)
profile.set_preference("browser.download.manager.showWhenStarting", False)
profile.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/x-gzip/text/csv")

## Firefox driver loads historical data page
driver = webdriver.Firefox(firefox_profile=profile)
driver.get("https://finance.yahoo.com/quote/{}/history?period1={}&period2={}&interval=1d&filter=history&frequency=1d"
           .format(ticker, period1, period2))

## Click on 'Download Data' Link
try:
    input_element = driver.find_element_by_css_selector(css_selector).click()
    print('Success!')

except:
    print('Failed!!!!!')

finally:
    driver.quit()
    print('Kill Driver!')

示例站点为: https://finance.yahoo.com/quote/CAT/history?period1=1262332800&period2=1538118000&interval=1d&filter=history&frequency=1d

在HTML的此部分中找到

css_selector,“ a.Fl(end):nth-​​child(1)”:

<svg class="Va(m)! Mend(5px) Stk($c-fuji-blue-1-b)! Fill($c-fuji-blue-1-b)! Cur(p)" width="15" height="15" viewBox="0 0 48 48" data-icon="download" style="fill: rgb(0, 129, 242); stroke: rgb(0, 129, 242); stroke-width: 0; vertical-align: bottom;"><path d="M43.002 43.002h-38c-1.106 0-2.002-.896-2.002-2v-11c0-1.105.896-2 2.002-2 1.103 0 1.998.895 1.998 2v9h34.002v-9c0-1.105.896-2 2-2s2 .895 2 2v11c0 1.103-.896 2-2 2m-19-8L11.57 23.307c-.75-.748-.75-1.965 0-2.715.75-.75 1.965-.75 2.715 0l7.717 7.716V2h4v26.308l7.717-7.716c.75-.75 1.964-.75 2.714 0s.75 1.967 0 2.715L24.002 35.002z"></path></svg><span>Download Data</span>

我的问题是:

  • 有没有更简单的方法来单击链接? xpath? partial_link?
  • 我要点击正确的css_selector吗?
  • 我需要将鼠标悬停在文本上才能单击下载数据链接吗?
  • 在网站加载时如何查找元素?该网站永远无法完成下载,并且有连续的广告服务器呼叫。

使用方法.find_element_by_link_text()会导致TimeoutException:

  

TimeoutException追溯(最新   在()中最后调用)
       21 ##转到主页以获取历史数据
       22 driver.get(“ https://finance.yahoo.com/quote/ {} / history?period1 = {}&period2 = {}&interval = 1d&filter = history&frequency = 1d”

     

---> 23 .format(ticker,period1,period2))
       24
       25 print('。get()完成!')

     

〜/ virtualenvs / demo / lib / python3.6 / site-packages / selenium / webdriver / remote / webdriver.py   在get(self,url)
      331在当前浏览器会话中加载网页。
      332“”“
  -> 333 self.execute(Command.GET,{'url':url})
      334
      335 @属性〜/ virtualenvs / demo / lib / python3.6 / site-packages / selenium / webdriver / remote / webdriver.py   在execute(self,driver_command,params)
      319 response = self.command_executor.execute(driver_command,params)
      320,如果响应:
  -> 321 self.error_handler.check_response(响应)
      322 response ['value'] = self._unwrap_value(
      323 response.get('value',None))

     

〜/ virtualenvs / demo / lib / python3.6 / site-packages / selenium / webdriver / remote / errorhandler.py   在check_response(自身,响应)中
      240 alert_text = value ['alert']。get('text')
      241引发exception_class(消息,屏幕,堆栈跟踪,alert_text)
  -> 242引发exception_class(消息,屏幕,堆栈跟踪)
      243
      244 def _value_or_default(self,obj,key,default):

     

TimeoutException:消息:300000ms之后超时加载页面

我对此的解释是该站点尚未完成加载,因此try / except / finally逻辑永远不会执行。

2 个答案:

答案 0 :(得分:2)

  • 是否有更简单的方法来单击链接?

    通过链接文本选择应该可以正常工作:

    $('#add-ingredients').append('<div class="form-group">...</div>');
    
  • 我要点击正确的css_selector吗?

    是的,选择器似乎是正确的

  • 我需要将鼠标悬停在文本上才能单击下载数据链接吗?

    不,您不需要将鼠标悬停在链接上

更新

如果您需要停止页面加载,请尝试以下解决方案:

driver.find_element_by_link_text('Download Data').click()

如果10秒钟内未加载页面,则将被强制停止

答案 1 :(得分:0)

可以请您尝试以下选项

 1. download = driver.find_element_by_xpath(".//*[@id='Col1-1-HistoricalDataTable-Proxy']/section/div[1]/div[2]/span[2]/a")
    download.click()
 2. download = driver.find_element_by_link_text('Download Data')
    download.click()
 3. download = driver.find_element_by_partial_link_text('Download')
    download.click()