Question

我正在尝试使用Python从此页面（http://xbrl.cninfo.com.cn/XBRL/allinfo.jsp?stkid=000410&getyear=2012&nowpage=Info.jsp&reportType=GB0110）获取动态生成的表。我曾尝试使用机械化，硒与PhantomJS webdriver模块，但无济于事。以下是我使用的代码的一部分：

url = 'http://xbrl.cninfo.com.cn/XBRL/allinfo.jsp?stkid=000410&getyear=2012&nowpage=Info.jsp&reportType=GB0110'
driver = webdriver.PhantomJS()
driver.set_window_size(1024, 768)
driver.get(url)
content = driver.page_source
# Used BeautifulSoup after this to get all the table content within the iframe tag but it's source is some jsp page.

我是网络抓取新手，所以不知道如何抓取动态创建的内容。请帮忙。感谢。

Answer 1

这是因为您想要的数据放在iframe中。试试这个

driver.get(url)
driver.switch_to.frame(driver.find_element_by_xpath("//iframe"))
content = driver.page_source

Web Scraping：使用Python从JSP生成HTML源代码

1 个答案: