当HTML没有改变时,我如何用Python进行网络浏览?

时间:2014-07-13 23:46:46

标签: python selenium web-scraping beautifulsoup

我目前正在使用Selenium和BeautifulSoup尝试从Google财经中搜索财务报表数据。例如:

http://www.google.com/finance?q=GOOG&fstype=ii

打开Goog​​le的损益表。当我获得Selenium点击"余额声明"和"现金流"页面顶部的按钮,页面上的图表和表格发生了变化,但是网址没有变化,当我拉取页面源时,它是带有损益表的原始页面。我的代码发布在下面:

driver = webdriver.Firefox()
driver.get("http://www.google.com/finance?q=" + ticker[0] + "&fstype=ii")

url1 = driver.page_source
soup1 = BeautifulSoup(url1)

element = driver.find_element_by_xpath('//*[@id=":1"]/a/b/b')
element.click()

driver.implicity_wait(3.0)
url2 = driver.page_source
soup2 = BeautifulSoup(url2)

element = driver.find_element_by_xpath('//*[@id=":2"]/a/b/b')
element.click()

driver.implicity_wait(3.0)
url3 = driver.page_source
soup3 = BeautifulSoup(url3)

driver.quit()

感谢任何帮助。谢谢。

1 个答案:

答案 0 :(得分:3)

您在这里不需要BeautifulSoup HTML解析器。 Selenium本身就足够强大了div

您需要的表格数据位于具有不同id s的from selenium import webdriver def print_header(element): table = element.find_element_by_id('fs-table') for row in table.find_elements_by_tag_name('th'): print row.text driver = webdriver.Firefox() driver.get('http://www.google.com/finance?q=GOOG&fstype=ii') print_header(driver.find_element_by_id('incinterimdiv')) print "----" # activate Balance Sheet element = driver.find_element_by_xpath('//*[@id=":1"]/a/b/b') element.click() print_header(driver.find_element_by_id('balinterimdiv')) print "----" # activate Cash Flow element = driver.find_element_by_xpath('//*[@id=":2"]/a/b/b') element.click() print_header(driver.find_element_by_id('casinterimdiv')) driver.quit() 元素内。激活每个选项卡并从适当的div中获取数据。

这是一个打印出所有标签内表格标题的例子:

In Millions of USD (except for per share items)
3 months ending 2014-03-31
3 months ending 2013-12-31
3 months ending 2013-09-30
3 months ending 2013-06-30
3 months ending 2013-03-31
----
In Millions of USD (except for per share items)
As of 2014-03-31
As of 2013-12-31
As of 2013-09-30
As of 2013-06-30
As of 2013-03-31
----
In Millions of USD (except for per share items)
3 months ending 2014-03-31
12 months ending 2013-12-31
9 months ending 2013-09-30
6 months ending 2013-06-30
3 months ending 2013-03-31

打印:

{{1}}