Question

当我使用浏览器向此网址http://www.waterwaysguide.org.au/waterwaysguide/access-point/4980/partial发出获取请求时，会返回完整的html网页。但是，当我使用python请求模块发出GET请求时，只返回部分html并且缺少核心内容。

如何更改代码以便我可以获取缺少的数据？

这是我正在使用的代码;

import requests
def get_data(point_num):
    base_url = 'http://www.waterwaysguide.org.au/waterwaysguide/access-point/{}/partial'
    r = requests.get(base_url)
    html_content = r.text
    print(html_content)
get_data(4980)

运行代码的结果如下所示。 div class =“view view-waterway-access-point-page ... 中的内容丢失了。

<div>
  <div class="modal-header">
    <button type="button" class="close" data-dismiss="modal" aria-label="Close">
      <span aria-hidden="true">&times;</span>
    </button>
    <h4 class="modal-title">
        Point of Interest detail    </h4>

  </div>
  <div class="modal-body">
    <div class="view view-waterway-access-point-page view-id-waterway_access_point_page view-display-id-page view-dom-id-c855bf9afdfe945979f96b2301d55784">
        
  
  
  
  
  
  
  
  
</div>  </div>
  <div class="modal-footer">
    
    <button type="button" id="closeRemoteModal" class="btn btn-action" data-dismiss="modal">Close</button>
  </div>
</div>

Answer 1

可能是在页面加载后使用javascript呈现元素的情况。所以，你只获得页面，而不是javascript渲染的部分你可能想看看     http://stanford.edu/~mgorkove/cgi-bin/rpython_tutorials/Scraping_a_Webpage_Rendered_by_Javascript_Using_Python.php
    https://medium.com/@hoppy/how-to-test-or-scrape-javascript-rendered-websites-with-python-selenium-a-beginner-step-by-c137892216aa
    Web-scraping JavaScript page with Python

Answer 2

我发现了我犯的错误。我从未使用过传递给函数的'point_num'参数，因此我的请求没有找到正确的URL。

代码正在运行，我已将行更改为

r = requests.get(base_url.format(point_num))

Answer 3

以下方法显示div class =“view view-waterway-access-point-page ...

中缺少的内容

>>> from urllib.request import Request, urlopen
>>> from bs4 import BeautifulSoup
>>> url = 'http://www.waterwaysguide.org.au/waterwaysguide/access-
point/4980/partial'
>>> req = Request(url,headers={'User-Agent': 'Mozilla/5.0'})
>>> webpage = urlopen(req).read()
>>> print(webpage)

Python请求模块在get请求期间不返回整页

3 个答案: