当URL相同时使用selenium选择下一页(Scraping)

时间:2015-07-09 19:19:09

标签: javascript python python-2.7 selenium web-scraping

我正试图抓住这个网站:http://data.eastmoney.com/xg/xg/

到目前为止,我已经使用selenium来执行javascript并获取表格。但是,我的代码现在只获得第一页。我想知道是否有办法访问其他17个页面,因为当我点击下一页时,URL不会改变,所以我不能每次迭代不同的URL

到目前为止,我的代码如下:

from selenium import webdriver
import lxml
from bs4 import BeautifulSoup
import time

def scrape():
    url = 'http://data.eastmoney.com/xg/xg/'
    d={}
    f = open('east.txt','a')
    driver = webdriver.PhantomJS()
    driver.get(url)
    lst = [x for x in range(0,25)]
    htmlsource = driver.page_source
    bs = BeautifulSoup(htmlsource)
    heading = bs.find_all('thead')[0]
    hlist = []
    for header in heading.find_all('tr'):
        head = header.find_all('th')
    for i in lst:
        if i!=2:
            hlist.append(head[i].get_text().strip())
    h = '|'.join(hlist)
    print h
    table = bs.find_all('tbody')[0]
    for row in table.find_all('tr'):
        cells = row.find_all('td')
        d[cells[0].get_text()]=[y.get_text() for y in cells]
    for key in d:
        ret=[]
        for i in lst:
            if i != 2:
                ret.append(d.get(key)[i])
        s = '|'.join(ret)
        print s     

if __name__ == "__main__":  
    scrape()

或者,如果我在每次点击后使用webdriver.Chrome()而不是PhantomJS然后在新页面上运行Python,我是否可以单击浏览器?

0 个答案:

没有答案