如何使用python复制URL的所有代码

时间:2017-12-06 18:27:43

标签: python html python-3.x python-requests urllib

我想使用Python 3.6复制网址的所有代码(http://modelseed.org/biochem/reactions/rxn00001),但我只能复制部分代码,而且我不知道原因。

到目前为止,我尝试了"请求"模块

import requests
page = requests.get("http://modelseed.org/biochem/reactions/rxn00001")
print(page.content)

和" urllib"

import urllib.request
site = urllib.request.urlopen("http://modelseed.org/biochem/reactions/rxn00001")
print(site.read())

代码中包含"反应详情"的信息部分,如"名称"," ID"和"缩写"缺少,但如果我检查Chrome开发人员栏上的代码,它们是可见的。

我可以使用上述两个代码下载的代码是:

<!DOCTYPE html>
<html lang="en" ng-app="ModelSEED">
 <head>
  <base href="/"/>
  <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
  <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
  <meta content="initial-scale=1, maximum-scale=1, user-scalable=no" name="viewport">
   <meta content="The ModelSEED is a resource for the reconstruction, exploration, comparison, and analysis of metabolic models." name="description"/>
   <link href="/img/ModelSEED-favicon.png?v=2.0" rel="shortcut icon"/>
   <meta content="nconrad" name="author"/>
   <title>
    ModelSEED
   </title>
   <link href="components/angular-material/angular-material.css" rel="stylesheet"/>
   <link href="components/bootstrap/dist/css/bootstrap.min.css" rel="stylesheet"/>
   <!-- to be removed -->
   <link href="components/font-awesome/css/font-awesome.min.css" rel="stylesheet"/>
   <link href="icomoon/style.css" rel="stylesheet"/>
   <link href="https://fonts.googleapis.com/icon?family=Material+Icons" rel="stylesheet"/>
   <link href="http://fonts.googleapis.com/css?family=Montserrat:400,700" rel="stylesheet" type="text/css"/>
   <link href="build/style.css" rel="stylesheet"/>
   <!--<script src="https://cdn.socket.io/socket.io-1.3.7.js"></script>-->
   <script src="build/site.js">
   </script>
   <!-- HTML5 Shim and Respond.js IE8 support of HTML5 elements and media queries -->
   <!-- WARNING: Respond.js doesn't work if you view the page via file:// -->
   <!--[if lt IE 9]>
        <script src="https://oss.maxcdn.com/libs/html5shiv/3.7.0/html5shiv.js"></script>
        <script src="https://oss.maxcdn.com/libs/respond.js/1.4.2/respond.min.js"></script>
    <![endif]-->
  </meta>
 </head>
 <body>
  <div style="height: 100%;" ui-view="">
  </div>
  <script>
   (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
      (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
      m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
      })(window,document,'script','//www.google-analytics.com/analytics.js','ga');

      ga('create', 'UA-67412611-1', 'auto');
      ga('send', 'pageview');
  </script>
 </body>
</html>

任何人都有任何暗示为什么&lt;之间的代码? div style =&#34; height:100%;&#34; UI视图=&#34;&#34; &GT;并且(仅在&lt; body&gt;之后&lt; script&gt;之前)未下载?

谢谢。

2 个答案:

答案 0 :(得分:1)

它是由javascript脚本插入的,因此,无论是请求还是urllib都会找到它,你需要使用浏览器,你应该尝试使用selenium或PhantomJS

类似的东西:

from selenium import webdriver

driver = webdriver.Chrome('./chromedriver')
driver.get(url)
driver.page_source

答案 1 :(得分:0)