我正在尝试使用此代码搜索网站:
url = 'https://www.seloger.com/list.htm?types=1%2C2&projects=2%2C5&natures=1%2C2%2C4&places=%5B%7Bdiv%3A2238%7D%5D&qsVersion=1.0&engine-version=new'
headers = {'User-Agent': '*',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
}
import requests
s = requests.Session()
s.headers.update(headers)
r = s.get(url,allow_redirects=False)
from bs4 import BeautifulSoup
soup = BeautifulSoup(r.text, 'html.parser')
我喝汤没有。我得到了:
<Response [307]>
对于r.headers我得到了:
{'Cache-Control': 'private', 'Content-Type': 'text/html; charset=utf-8',
'Location': '/erreur-temporaire/', 'Vary': 'User-Agent', 'Server': 'Microsoft-IIS/8.5', 'Set-Cookie': 'ASP.NET_SessionId=noczxhxrmkqrmqso3opalin1; path=/; HttpOnly, Compte=; domain=.seloger.com; expires=Sat, 02-Jun-2018 09:34:11 GMT; path=/, SearchAnnDep=67; domain=.seloger.com; expires=Tue, 03-Jul-2018 09:34:11 GMT; path=/, __uzma=ma7e669aa2-ef5b-4b04-85b3-69085e096d7a2594; expires=Wed, 31-May-2028 09:34:11 GMT; path=/, __uzmb=1528025651; expires=Wed, 31-May-2028 09:34:11 GMT; path=/, __uzmc=871151037553; expires=Wed, 31-May-2028 09:34:11 GMT; path=/, __uzmd=1528025651; expires=Wed, 31-May-2028 09:34:11 GMT; path=/', 'X-S': 'X17, X17', 'Cache': 'max-age=10, max-age=10', 'X-Powered-By': 'ASP.NET', 'Date': 'Sun, 03 Jun 2018 09:34:11 GMT', 'Content-Length': '0'}
我知道可能与“地点”有关,但我不确定是什么......
我也添加了
allow_redirects=False
到s.get因为我收到了“太多重定向”错误...