python硒刮整张桌子

时间:2018-09-21 16:54:16

标签: python-3.x selenium selenium-webdriver web-scraping webdriver

此代码的目的是从一些链接中刮取数据表,然后将其转换为熊猫数据框。

问题在于,此代码仅只刮擦表第一页中的前7行,而我想捕获整个表。 因此,当我尝试遍历表格页面时,出现错误。

代码如下:

from selenium import webdriver

urls = open(r"C:\Users\Sayed\Desktop\script\sample.txt").readlines()
for url in urls:
    driver = webdriver.Chrome(r"D:\Projects\Tutorial\Driver\chromedriver.exe")
    driver.get(url)
    for item in driver.find_element_by_xpath('//*[contains(@id,"showMoreHistory")]/a'):
        driver.execute_script("arguments[0].click();", item)

    for table in driver.find_elements_by_xpath('//*[contains(@id,"eventHistoryTable")]//tr'):
        data = [item.text for item in table.find_elements_by_xpath(".//*[self::td or self::th]")]
        print(data)

这是错误:

回溯(最近通话最近一次):

文件“ D:/Projects/Tutorial/ff.py”,第8行     对于driver.find_element_by_xpath('// * [包含(@id,“ showMoreHistory”)] / a')中的项目:

TypeError:“ WebElement”对象不可迭代

2 个答案:

答案 0 :(得分:1)

查看以下脚本,从该网页获取整个表格。我在脚本中使用了经过编码的延迟,这不是一个好习惯。但是,您始终可以定义import pandas as pd def dummy(): df=pd.read_csv('DF.csv',header=0) region_list = ['North', 'South', 'Central', 'West', 'East'] for region in region_list: df[region] = 0 for i in range(len(df['Region'])): for region in region_list: if df['Region'][i]== region: df[region][i]=1 housing_list = ['apartment', 'house', 'townhouse', 'unit', 'villa', 'acreage', 'other'] for item in housing_list: df[item] = 0 for i in range(len(df['Type_Property'])): for item in housing_list: if df['Type_Property'][i]== item: df[item][i]=1 df.to_csv('Dummied.csv') dummy() 来使代码更健壮:

Explicit Wait

要获取耗尽import time from selenium import webdriver url = 'https://www.investing.com/economic-calendar/investing.com-eur-usd-index-1155' driver = webdriver.Chrome() driver.get(url) item = driver.find_element_by_xpath('//*[contains(@id,"showMoreHistory")]/a') driver.execute_script("arguments[0].click();", item) time.sleep(2) for table in driver.find_elements_by_xpath('//*[contains(@id,"eventHistoryTable")]//tr'): data = [item.text for item in table.find_elements_by_xpath(".//*[self::td or self::th]")] print(data) driver.quit() 按钮以及定义show more的所有数据,您可以尝试以下脚本:

Explicit Wait

答案 1 :(得分:0)

根据您的问题和网址https://www.investing.com/economic-calendar/investing.com-eur-usd-index-1155来抓取整个表格,您可以使用以下解决方案:

  • 代码块:

    # -*- coding: UTF-8 -*-
    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.common.exceptions import TimeoutException
    
    table_rows = []
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_argument('disable-infobars')
    driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    driver.get("https://www.investing.com/economic-calendar/investing.com-eur-usd-index-1155")
    show_more_button = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.genTbl.openTbl.ecHistoryTbl#eventHistoryTable1155 tr>th.left.symbol")))
    driver.execute_script("arguments[0].scrollIntoView(true);",show_more_button);
    myLength = len(WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "table.genTbl.openTbl.ecHistoryTbl#eventHistoryTable1155 tr[event_attr_id='1155']"))))
    while True:
        try:
            WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div#showMoreHistory1155>a"))).click()
            WebDriverWait(driver, 20).until(lambda driver: len(driver.find_elements_by_css_selector("table.genTbl.openTbl.ecHistoryTbl#eventHistoryTable1155 tr[event_attr_id='1155']")) > myLength)
            table_rows = driver.find_elements_by_css_selector("table.genTbl.openTbl.ecHistoryTbl#eventHistoryTable1155 tr[event_attr_id='1155']")
            myLength = len(table_rows)
        except TimeoutException:
            break
    for row in table_rows:
        print(row.text)
    driver.quit()
    
  • 控制台输出:

    Sep 24, 2018 01:30
    Sep 17, 2018 01:30 53.1%   55.3%
    Sep 10, 2018 01:30 55.3%   49.0%
    Sep 03, 2018 01:30 49.0%   43.3%
    Aug 27, 2018 01:30 43.3%   49.7%
    Aug 20, 2018 01:30 49.7%   52.5%
    Aug 13, 2018 01:30 52.5%   59.9%
    Aug 06, 2018 01:30 59.9%   62.6%
    Jul 30, 2018 01:30 62.6%   52.8%
    Jul 23, 2018 01:30 52.8%   52.7%
    Jul 16, 2018 01:30 52.7%   46.2%
    Jul 10, 2018 01:30 46.2%   55.3%
    Jul 02, 2018 01:30 55.3%   53.1%
    Jun 25, 2018 01:30 53.1%   66.2%
    Jun 18, 2018 01:30 66.2%   65.2%
    Jun 11, 2018 01:30 65.2%   61.2%
    Jun 04, 2018 01:30 61.2%   63.9%
    May 28, 2018 01:30 63.9%   67.0%
    May 21, 2018 01:30 67.0%   63.2%
    May 14, 2018 01:30 63.2%   61.3%
    May 07, 2018 01:30 61.3%   57.6%
    Apr 30, 2018 01:30 57.6%   64.8%
    Apr 23, 2018 01:30 64.8%   65.2%
    Apr 16, 2018 01:30 65.2%   60.4%
    Apr 09, 2018 01:30 60.4%   63.3%
    Apr 02, 2018 01:30 63.3%   62.1%
    Mar 26, 2018 01:30 62.1%   65.7%
    Mar 19, 2018 02:30 65.7%   56.0%
    Mar 12, 2018 02:30 56.0%   62.3%
    Mar 05, 2018 02:30 62.3%   59.1%
    Feb 26, 2018 02:30 59.1%   52.8%
    Feb 19, 2018 02:30 52.8%   55.8%
    Feb 12, 2018 02:30 55.8%   51.7%
    Feb 05, 2018 02:30 51.7%   56.8%
    Jan 29, 2018 02:30 56.8%   52.2%
    Jan 22, 2018 02:30 52.2%   56.1%
    Jan 15, 2018 02:30 56.1%   60.2%
    Jan 08, 2018 02:30 60.2%   54.6%
    Jan 01, 2018 02:30 54.6%   48.4%
    Dec 25, 2017 02:30 48.4%   66.4%
    Dec 18, 2017 02:30 66.4%   58.9%
    Dec 11, 2017 02:30 58.9%   53.8%
    Dec 04, 2017 02:30 53.8%   55.9%
    Nov 28, 2017 02:30 55.9%   53.7%
    Nov 20, 2017 02:30 53.7%   58.6%
    Nov 14, 2017 02:30 58.6%   52.8%
    Nov 06, 2017 02:30 52.8%   57.6%
    Oct 30, 2017 01:30 57.6%   54.7%
    Oct 23, 2017 01:30 54.7%   58.9%
    Oct 16, 2017 01:30 58.9%   57.3%
    Oct 09, 2017 01:30 57.3%   64.0%
    Oct 02, 2017 01:30 64.0%   47.5%
    Sep 25, 2017 01:30 47.5%   52.2%
    Sep 18, 2017 01:30 52.2%   55.5%
    Sep 11, 2017 01:30 55.5%   54.3%
    Sep 04, 2017 01:30 54.3%   54.2%
    Aug 28, 2017 01:30 54.2%   51.4%
    Aug 21, 2017 01:30 51.4%   57.4%
    Aug 14, 2017 01:30 57.4%   51.2%
    Aug 07, 2017 01:30 51.2%   51.3%
    Jul 31, 2017 01:30 51.3%   52.8%
    Jul 24, 2017 01:30 52.8%   53.3%
    Jul 17, 2017 01:30 53.3%   54.1%
    Jul 10, 2017 01:30 54.1%   51.9%
    Jul 03, 2017 01:30 51.9%   40.6%
    Jun 26, 2017 01:30 40.6%   52.6%
    Jun 19, 2017 01:30 52.6%   51.0%
    Jun 12, 2017 01:30 51.0%   52.1%
    Jun 05, 2017 01:30 52.1%   59.1%
    May 29, 2017 01:30 59.1%   46.9%
    May 22, 2017 01:30 46.9%   53.0%
    May 15, 2017 01:30 53.0%   44.9%
    May 08, 2017 01:30 44.9%   37.0%
    May 01, 2017 01:30 37.0%   43.0%
    Apr 24, 2017 01:30 43.0%   52.4%
    Apr 10, 2017 01:30 52.4%   55.1%
    Apr 03, 2017 01:30 55.1%   43.5%
    Mar 27, 2017 02:30 43.5%   36.0%
    Mar 20, 2017 02:30 36.0%   32.3%
    Mar 13, 2017 02:30 32.3%   42.8%
    Mar 06, 2017 02:30 42.8%   39.1%
    Feb 27, 2017 02:30 39.1%   41.7%
    Feb 20, 2017 02:30 41.7%   43.2%
    Feb 13, 2017 02:30 43.2%   36.6%
    Feb 06, 2017 02:30 36.6%   39.7%
    Jan 30, 2017 02:30 39.7%   33.5%
    Jan 23, 2017 02:30 33.5%   36.8%
    Jan 16, 2017 03:30 36.8%   37.0%
    Jan 09, 2017 02:30 37.0%   41.6%
    Jan 02, 2017 02:30 41.6%   35.8%
    Dec 26, 2016 02:30 35.8%   42.3%
    Dec 19, 2016 02:30 42.3%   39.7%
    Dec 12, 2016 04:15 39.7%   33.8%
    Dec 05, 2016 02:30 33.8%   37.1%
    Nov 29, 2016 02:30 37.1%   41.9%
    Nov 21, 2016 02:30 41.9%   39.1%
    Nov 15, 2016 02:00 39.1%   20.5%
    Nov 07, 2016 02:30 20.5%   27.4%
    Oct 31, 2016 02:30 27.4%   33.4%
    Oct 25, 2016 02:30 33.4%   30.8%
    Oct 18, 2016 02:30 30.8%   26.6%
    Oct 10, 2016 02:30 26.6%   28.6%
    Oct 05, 2016 02:00 28.6%   26.2%
    Sep 26, 2016 02:30 26.2%   34.8%
    Sep 19, 2016 02:30 34.8%   21.2%
    Sep 13, 2016 02:30 21.2%   27.0%
    Sep 05, 2016 02:30 27.0%   32.7%
    Aug 29, 2016 02:30 32.7%   23.9%
    Aug 22, 2016 02:30 23.9%   28.8%
    Aug 15, 2016 02:30 28.8%   30.8%
    Aug 08, 2016 02:30 30.8%   20.3%
    Aug 01, 2016 02:30 20.3%   30.2%
    Jul 25, 2016 02:30 30.2%   29.5%
    Jul 18, 2016 02:30 29.5%   26.2%
    Jul 11, 2016 02:30 26.2%   27.5%
    Jul 04, 2016 02:30 27.5%   26.8%
    Jun 27, 2016 02:30 26.8%   35.1%
    Jun 20, 2016 02:30 35.1%   22.8%
    Jun 13, 2016 02:30 22.8%   32.5%
    Jun 06, 2016 02:30 32.5%   35.6%
    May 30, 2016 02:30 35.6%   39.5%
    May 23, 2016 02:30 39.5%   37.8%
    May 16, 2016 03:30 37.8%   39.5%
    May 09, 2016 02:30 39.5%   30.3%
    May 02, 2016 02:30 30.3%   32.9%
    Apr 25, 2016 02:30 32.9%   29.6%
    Apr 18, 2016 06:00 29.6%   30.5%
    Apr 11, 2016 02:30 30.5%   22.7%
    Apr 04, 2016 03:30 22.7%   32.1%
    Mar 28, 2016 03:30 32.1%   23.2%
    Mar 21, 2016 03:30 23.2%   26.7%
    Mar 14, 2016 03:30 26.7%   22.6%
    Mar 07, 2016 03:30 22.6%   33.7%
    Feb 29, 2016 03:30 33.7%   34.8%
    Feb 22, 2016 03:30 34.8%   33.3%
    Feb 15, 2016 03:30 33.3%   33.3%
    Feb 08, 2016 03:30 33.3%   34.3%
    Feb 01, 2016 03:30 34.3%   33.2%
    Jan 25, 2016 03:30 33.2%   27.0%
    Jan 18, 2016 03:30 27.0%   27.2%
    Jan 11, 2016 03:30 27.2%   30.0%
    Jan 05, 2016 03:30 30.0%   24.0%
    Dec 29, 2015 03:30 24.0%   33.3%
    Dec 21, 2015 03:30 33.3%   31.2%
    Dec 14, 2015 04:30 31.2%   27.1%
    Dec 07, 2015 03:00 27.1%   29.8%
    Dec 01, 2015 03:00 29.8%   27.5%
    Nov 23, 2015 03:00 27.5%   33.1%
    Nov 17, 2015 04:00 33.1%   26.8%
    Nov 09, 2015 02:30 26.8%   24.3%
    Nov 02, 2015 01:30 24.3%   36.4%
    Oct 26, 2015 01:30 36.4%   28.6%
    Oct 19, 2015 01:30 28.6%   25.5%
    Oct 11, 2015 04:30 25.5%   29.6%
    Oct 06, 2015 01:00 29.6%   28.5%
    Sep 28, 2015 01:30 28.5%   29.1%
    Sep 21, 2015 01:30 29.1%   21.2%
    Sep 14, 2015 01:30 21.2%   29.8%
    Sep 07, 2015 01:30 29.8%   36.3%
    Aug 31, 2015 01:30 36.3%   35.6%
    Aug 24, 2015 01:30 35.6%   26.4%
    Aug 17, 2015 01:30 26.4%   24.8%
    Aug 10, 2015 01:30 24.8%   29.7%
    Aug 03, 2015 01:30 29.7%   24.8%
    Jul 27, 2015 01:30 24.8%   30.7%
    Jul 20, 2015 01:30 30.7%   27.9%
    Jul 13, 2015 01:30 27.9%   27.4%
    Jul 07, 2015 01:30 27.4%   26.8%
    Jun 29, 2015 01:30 26.8%   33.1%
    Jun 22, 2015 01:30 33.1%   33.6%
    Jun 15, 2015 03:30 33.6%   28.9%
    Jun 08, 2015 01:30 28.9%   23.0%
    Jun 01, 2015 01:30 23.0%   34.0%
    May 25, 2015 04:00 34.0%   28.9%
    May 18, 2015 01:30 28.9%   28.8%
    May 11, 2015 01:30 28.8%   28.3%
    May 04, 2015 02:00 28.3%   23.7%
    Apr 27, 2015 01:30 23.7%   27.2%
    Apr 20, 2015 01:30 27.2%   33.7%
    Apr 13, 2015 02:00 33.7%   23.2%
    Apr 06, 2015 02:00 23.2%   19.8%
    Mar 30, 2015 02:30 19.8%   24.1%
    Mar 23, 2015 02:30 24.1%   27.2%
    Mar 16, 2015 03:00 27.2%   35.6%
    Mar 09, 2015 02:30 35.6%   34.4%
    Mar 02, 2015 02:30 34.4%   30.2%
    Feb 23, 2015 02:30 30.2%   26.6%
    Feb 16, 2015 03:30 26.6%   23.8%
    Feb 09, 2015 02:30 23.8%   26.4%
    Feb 02, 2015 02:30 26.4%   23.9%
    Jan 26, 2015 02:30 23.9%   28.9%
    Jan 19, 2015 02:30 28.9%   35.5%
    Jan 12, 2015 02:30 35.5%   38.1%
    Jan 06, 2015 03:30 38.1%   40.6%
    Jan 01, 2015 02:30 40.6%   45.2%
    Dec 22, 2014 02:00 45.2%   39.8%
    Dec 15, 2014 02:00 39.8%   41.7%
    Dec 07, 2014 21:00 41.7%   33.8%
    Dec 02, 2014 03:00 33.8%   38.6%
    Nov 24, 2014 01:30 38.6%   39.2%
    Nov 17, 2014 01:00 39.2%   33.1%
    Nov 10, 2014 01:00 33.1%   35.4%
    Nov 04, 2014 03:00 35.4%   37.3%
    Oct 27, 2014 02:00 37.3%   33.7%
    Oct 19, 2014 22:00 33.7%   36.2%
    Oct 13, 2014 01:00 36.2%   44.5%
    Oct 06, 2014 01:00 44.5%   41.3%
    Sep 29, 2014 01:00 41.3%   50.3%
    Sep 21, 2014 22:35 50.3%   39.5%
    Sep 15, 2014 00:45 39.5%   39.9%
    Sep 08, 2014 01:00 39.9%   42.8%
    Sep 01, 2014 02:35 42.8%   41.9%
    Aug 25, 2014 01:00 41.9%   38.9%
    Aug 18, 2014 01:00 38.9%   34.0%
    Aug 11, 2014 01:00 34.0%   38.2%
    Aug 04, 2014 01:00 38.2%   38.4%
    Jul 28, 2014 01:00 38.4%   42.3%
    Jul 21, 2014 01:00 42.3%   37.2%
    Jul 14, 2014 01:00 37.2%   39.6%
    Jul 07, 2014 01:00 39.6%   39.8%
    Jun 30, 2014 01:00 39.8%   36.1%
    Jun 23, 2014 00:30 36.1%   37.6%
    Jun 16, 2014 00:30 37.6%   36.5%
    Jun 09, 2014 00:30 36.5%   44.1%
    Jun 01, 2014 22:00 44.1%   49.4%
    May 26, 2014 00:30 49.4%   41.0%
    May 19, 2014 00:00 41.0%   55.0%
    May 12, 2014 00:00 55.0%   41.1%
    May 04, 2014 06:00 41.1%   43.5%
    Apr 27, 2014 06:00 43.5%   40.3%
    Apr 06, 2014 06:00 40.3%