使用Selinium和Python在Excel工作表中刮取HTML表数据

时间:2015-12-14 08:02:55

标签: excel python-3.x selenium web-scraping

我想从here中删除HTML表格数据 并使用Python和selenium,xlrd,xlwt,urllib2模块将该数据存储到excel表中。真正的问题是在将一些细节填充到重定向某个页面的页面之后,将该URL放入urllib.open()函数,但将表信息显示为空。

import os,requests,time,xlrd,xlwt
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
from selenium.common.exceptions import NoSuchElementException
browser=webdriver.Firefox()
browser.get("http://www.moneycontrol.com/stocks/histstock.php")
browser.maximize_window()
browser.find_element_by_link_text("INDEX").click()

index_list=["--Index--","S&P BSE Sensex","CNX Nifty","S&P BSE Smallcap","S&P BSE Midcap","S&P BSE 100","S&P BSE 200","S&P BSE 200","S&P BSE BANKEX","S&P BSE Capital Goods","S&P BSE Capital Goods","S&P BSE Metals","S&P BSE IT","S&P BSE Auto","S&P BSE Healthcare","S&P BSE Healthcare","S&P BSE Realty","S&P BSE TECk","S&P BSE PSU","S&P BSE Consumer Durables","S&P BSE Consumer Durables","S&P BSE SHARIAH","S&P BSE IPO","CNX Midcap Index -NSE","CNX Nifty Junior","CNX DEFTY","Nifty Midcap 50","CNX 100","CNX 500","Bank Nifty","CNX IT","CNX REALTY","CNX INFRA","CNX INFRA","CNX FMCG","CNX FMCG","CNX PHARMA","CNX PSE","CNX PSU BANK","CNX SERVICE","CNX SERVICE","CNX SERVICE","CNX SERVICE","CNX SERVICE"]
frm_day=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30]
frm_mnth=["Jan","Feb","Mar","Apr","May","June","July","Aug","Sep","Oct","Nov","Dec"]
frm_year=[2015,2014,2013,2012,2011,2010,2009,2008,2007,2006,2005,2004,2003,2002,2001,2000]
for i in frm_day:
    """browser.find_element_by_xpath(".//*[@id='indian_indices']/option[3]")"""
    select_indices=browser.find_element_by_name("indian_indices").send_keys(index_list[i+1])
    link=browser.find_element_by_name("frm_dy").send_keys(frm_day[i+1])
    browser.find_element_by_name("frm_mth").send_keys(frm_mnth[i+1])
    browser.find_element_by_name("frm_yr").send_keys(frm_year[i+1])
    browser.find_element_by_name("to_dy").send_keys(frm_day[i])
    browser.find_element_by_name("to_mth").send_keys(frm_mnth[i])
    browser.find_element_by_name("to_yr").send_keys(frm_year[i])
    browser.find_element_by_xpath("html/body/center[2]/div/div/div[5]/div[4]/div[2]/div[6]/table/tbody/tr/td[1]/form/div[4]/input[1]").click()
    break

执行代码后,页面重定向到该页面上的otherpage.from,我需要表格的数据,这可能吗?

2 个答案:

答案 0 :(得分:0)

你可以使用" browser.current_url",这个包含table的页面的url.Pass这个url在browser.get(browser.current_url)和iterate table。

答案 1 :(得分:0)

store=find_element_by_xpath('.//*[@id='wmd-input']') print(store.text) 使用上面的代码获取重定向的页表数据。并将所有数据存储到文本文件中并转换为Excel Sheet / CSV。 谢谢。 http://stackoverflow.com