用Python浏览基于Vaadin的网站的无浏览器网页

时间:2015-10-12 09:56:25

标签: python web-scraping phantomjs

我是网络抓取的新手,我遇到了一个问题。

我尝试使用Python,selenium和PhantomJS从这个站点“https://www.iso.org/obp/ui/#iso:code:3166:JP”中提取状态列表,但是输出失败如下所示。

<!DOCTYPE html><html><head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=11;chrome=1">
  <style type="text/css">html, body {height:100%;margin:0;}</style>
  <link rel="shortcut icon" type="image/vnd.microsoft.icon" href="./../VAADIN/themes/obp/favicon.ico">
  <link rel="icon" type="image/vnd.microsoft.icon" href="./../VAADIN/themes/obp/favicon.ico">
 <link rel="stylesheet" type="text/css" href="./../VAADIN/themes/obp/styles.css"><script type="text/javascript" src="./../VAADIN/widgetsets/org.iso.obp.ui.widgetset.applicationWidgetset/org.iso.obp.ui.widgetset.applicationWidgetset.nocache.js?1444641834593"></script><script src="https://www.iso.org/obp/VAADIN/widgetsets/org.iso.obp.ui.widgetset.applicationWidgetset/913365F3A38F531CF0D09D8744F3A155.cache.js"></script></head>
 <body scroll="auto" class=" v-generated-body">
  <div id="obpui-105541713" class=" v-app obp">
   <div class=" v-app-loading"></div>
   <noscript>
    You have to enable javascript in your browser to use an application built with Vaadin.
   </noscript>
  </div>
  <script type="text/javascript" src="./../VAADIN/vaadinBootstrap.js"></script>
  <script type="text/javascript">//<![CDATA[
if (!window.vaadin) alert("Failed to load the bootstrap javascript: ./../VAADIN/vaadinBootstrap.js");
vaadin.initApplication("obpui-105541713",{"heartbeatInterval":300,"versionInfo":{"vaadinVersion":"7.3.10"},"vaadinDir":"./../VAADIN/","authErrMsg":{"message":"Take note of any unsaved data, and <u>click here<\/u> or press ESC to continue.","caption":"Authentication problem"},"widgetset":"org.iso.obp.ui.widgetset.applicationWidgetset","theme":"obp","comErrMsg":{"message":"Take note of any unsaved data, and <u>click here<\/u> or press ESC to continue.","caption":"Communication problem"},"serviceUrl":".","standalone":true,"sessExpMsg":{"message":"Take note of any unsaved data, and <u>click here<\/u> or press ESC key to continue.","caption":"Session Expired"}});
//]]></script>

</body></html>

我在Python中的代码就在这里。

from selenium import webdriver

target_url = 'https://www.iso.org/obp/ui/#iso:code:3166:JP'

driver = webdriver.PhantomJS()
driver.get( target_url)

print driver.page_source

有没有解决方案?

0 个答案:

没有答案