未能使用Selenium抓取网络数据

时间:2017-06-29 04:03:04

标签: javascript python selenium web-scraping

我正在尝试从https://icostats.com/上的首页表中获取数据。但有些事情不是点击。

from selenium import webdriver

browser = webdriver.Chrome(executable_path=r'C:\Scrapers\chromedriver.exe')
browser.get("https://icostats.com")
browser.find_element_by_xpath("""//*[@id="app"]/div/div[2]/div[2]/div[2]/div[2]/div[8]/span/span""").s()
posts = browser.find_element_by_class_name("tdPrimary-0-75")
for post in posts:
    print(post.text)

我得到的错误:

*

  

C:\ Python36 \ python.exe C:/.../ PycharmProjects / PyQtPS / ICO_spyder.py   Traceback(最近一次调用最后一次):文件   “C:/.../ PycharmProjects / PyQtPS / ICO_spyder.py”,第5行,in       browser.find_element_by_xpath( “” “// [@ ID =” 应用 “] / DIV / DIV [2] / DIV [2] / DIV [2] / DIV [1] / DIV [2]” “” )。单击()   文件   “C:\ Python36 \ LIB \站点包\硒\ webdriver的\遥控\ webdriver.py”   第313行,在find_element_by_xpath中       return self.find_element(by = By.XPATH,value = xpath)文件“C:\ Python36 \ lib \ site-packages \ selenium \ webdriver \ remote \ webdriver.py”,   第791行,​​在find_element中       'value':value})['value']文件“C:\ Python36 \ lib \ site-packages \ selenium \ webdriver \ remote \ webdriver.py”,   第256行,执行中       self.error_handler.check_response(response)文件“C:\ Python36 \ lib \ site-packages \ selenium \ webdriver \ remote \ errorhandler.py”,   第194行,在check_response中       提出exception_class(message,screen,stacktrace)selenium.common.exceptions.NoSuchElementException:消息:没有这样的   element:无法定位元素:   { “方法”: “的xpath”, “选择器”: “// [@ ID =” 应用“] / DIV / DIV [2] / DIV [2] / DIV [2] / DIV [1] / DIV [2]“}   (会议信息:chrome = 59.0.3071.115)(驱动信息:   chromedriver = 2.30.477700   (0057494ad8732195794a7b32078424f92a5fce41),platform = Windows NT   6.1.7600 x86_64)

*

修改

终于搞定了:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait

browser = webdriver.Chrome(executable_path=r'C:\Scrapers\chromedriver.exe')
browser.get("https://icostats.com")
wait(browser, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "#app > div > div.container-0-16 > div.table-0-20 > div.tbody-0-21 > div:nth-child(2) > div:nth-child(8)")))

posts = browser.find_elements_by_class_name("thName-0-55")
for post in posts:
    print(post.text)

posts = browser.find_elements_by_class_name("tdName-0-73")
for post in posts:
    print(post.text)

有没有办法迭代每个标题/列并将其导出到csv文件而不必像这样遍历每个类?

2 个答案:

答案 0 :(得分:1)

  1. 似乎此行中没有s() method
  2.   

    browser.find_element_by_xpath( “” “// * [@ ID =” 应用“] / DIV / DIV [2] / DIV [2] / DIV [2] / DIV [2] / DIV [8] /跨度/跨度 “”“)。()的

    所以,你需要的可能是

    browser.find_element_by_xpath("""//*[@id="app"]/div/div[2]/div[2]/div[2]/div[2]/div[8]/span/span""").text
    
    1. 由于你想迭代结果,这一行:

      posts = browser.find_element_by_class_name("tdPrimary-0-75")

    2. 应该是

      posts = browser.find_elements_by_class_name("tdPrimary-0-75")
      

答案 1 :(得分:1)

JavaScript动态生成的必需数据。您需要等到它出现在页面上:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait

browser = webdriver.Chrome(executable_path=r'C:\Scrapers\chromedriver.exe')
browser.get("https://icostats.com")
wait(browser, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "div#app>div")))
posts = browser.find_element_by_class_name("tdPrimary-0-75")
for post in posts:
    print(post.text)