在python中使用selenium抓取网页时出现的问题

时间:2018-03-06 14:57:46

标签: python selenium web-scraping

我已经获得了一个模型,可以在选定的网站上运行成功的网络刮刀,但是,当我改变这个以从第二个网站收集数据时,它会一直作为错误返回。我不确定代码中是否有错误或网站拒绝我的请求。你能不能看看这个,看看我的问题在哪里。任何帮助非常感谢!

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()

try:
    driver.get("http://www.caiso.com/TodaysOutlook/Pages/supply.aspx") # load the page
    WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.CSS_SELECTOR, '.highcharts-legend-item highcharts-pie-series highcharts-color-0'))) # wait till relevant elements are on the page
except:
     driver.quit() # quit if there was an error getting the page or we've waited 15 seconds and the stats haven't appeared.
stat_elements = driver.find_elements_by_css_selector('.highcharts-legend-item highcharts-pie-series highcharts-color-0')
for el in stat_elements: 
    print(el.find_element_by_css_selector('b').text)
    print(el.find_element_by_css_selector('br').text)
driver.quit()

1 个答案:

答案 0 :(得分:0)

首先,你传递错误的CSS,因为它应该像这样

.highcharts-legend-item.highcharts-pie-series.highcharts-color-0

不像你提到的那样。

然后您关闭浏览器,然后尝试再次关闭它以获取错误

try:
    driver.get("http://www.caiso.com/TodaysOutlook/Pages/supply.aspx") # load the page
    WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.CSS_SELECTOR, '.highcharts-legend-item.highcharts-pie-series.highcharts-color-0'))) # wait till relevant elements are on the page
except:
     driver.quit()

在列表项上接下来,您将获取文本

print(el.find_element_by_css_selector('b').text)

此处调试代码:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium.common.exceptions import NoSuchElementException

driver = webdriver.Chrome()

try:
    driver.get("http://www.caiso.com/TodaysOutlook/Pages/supply.aspx") # load the page
    WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.CSS_SELECTOR, '.highcharts-legend-item.highcharts-pie-series.highcharts-color-0'))) # wait till relevant elements are on the page
     #driver.quit() # quit if there was an error getting the page or we've waited 15 seconds and the stats haven't appeared.
except TimeoutException:
    pass
finally:
    try:
        stat_elements = driver.find_elements_by_css_selector('.highcharts-legend-item.highcharts-pie-series.highcharts-color-0')
        for el in stat_elements:
            for i in el.find_elements_by_tag_name('b'):
                print(i.text)
            for i in el.find_elements_by_tag_name('br'):
                print(i.text)
    except NoSuchElementException:
        print("No Such Element Found")
    driver.quit()

我希望这能解决你的问题,如果没有,请告诉我。