Question

我正在尝试使用请求和 Beautiful Soup 库从特定网站上抓取一些数据。不幸的是，我没有收到该页面的HTML，而是得到了父页面 https://salesweb.civilview.com 的HTML。谢谢您的帮助！

import requests
from bs4 import BeautifulSoup

example="https://salesweb.civilview.com/Sales/SaleDetails?PropertyId=473016965"
exampleGet=requests.get(example)
exampleGetText=exampleGet.text
soup = BeautifulSoup(exampleGetText,"lxml")
soup

Answer 1

您需要向请求添加cookie：

import requests
from bs4 import BeautifulSoup

cookie = {'ASP.NET_SessionId': 'rk2b0dxast1eyu5jvxezltgh'}

example="https://salesweb.civilview.com/Sales/SaleDetails?PropertyId=473016964"
exampleGet=requests.get(example, cookies=cookie)
exampleGetText=exampleGet.text
soup = BeautifulSoup(exampleGetText,"lxml")
soup.title

<title>Sales Listing Detail</title>

该特定的Cookie可能不适用于您，因此您需要手动导航到该页面一次，然后进入浏览器中的开发人员（网络检查器）工具，并在网络中的“标题”下查找该Cookie标签。我的Cookie看起来像“ ASP.NET_SessionId = rk2b0dxast1eyu5jvxezltgh”。

该cookie对其他属性页也应有效。

通过请求返回父网页HTML进行网页抓取

1 个答案: