我目前正在研究网页抓取,仅供测试!我不知道为什么会出现此错误,请问您看一下代码中我做错了什么,可以帮助我解决问题吗?
from urllib.request import urlopen
from bs4 import BeautifulSoup as bs
from urllib.request import HTTPError
import sys
html = urlopen("https://www.expedia.co.kr/Hotel-Search?destination=서울&startDate=2019.06.06&endDate=2019.06.07&rooms=1&adults=2")
soup = bs(html,"html.parser")
section = soup.find_all(class_="cf flex-1up flex-listing flex-theme-light cols-nested")
card = soup.find_all(class_="flex-card")
infoprice = soup.find_all(class_="flex-content info-and-price MULTICITYVICINITY avgPerNight")
rows = soup.find_all(class_="flex-area-primary")
hotelinfo = soup.find_all('ul',class_="hotel-info")
hotelTitles = soup.find_all('li',class_="hotelTitle")
for hotelTitle in hotelTitles:
hotellist = hotelTitle.find('h4',class_="hotelName fakeLink")
h = hotellist.get.text().strip()
print(h)
答案 0 :(得分:0)
为什么不改用requests
:
import requests
html = requests.get("https://www.expedia.co.kr/Hotel-Search?destination=서울&startDate=2019.06.06&endDate=2019.06.07&rooms=1&adults=2")
soup = BeautifulSoup(html.content,'html.parser')
我发现它避免了可能的编码问题,在您的情况下,其余代码保持不变。
答案 1 :(得分:0)
您可以模仿页面发出的POST请求并使用请求。您会收到包含所有酒店数据的json响应。查看示例json响应here。
import requests
headers = {'User-Agent' : 'Mozilla/5.0', 'Referer' : 'https://www.expedia.co.kr/Hotel-Search?destination=%EC%84%9C%E'}
r = requests.post("https://www.expedia.co.kr/Hotel-Search-Data?responsive=true&destination=%EC%84%9C%EC%9A%B8&startDate=2019.06.06&endDate=2019.06.07&rooms=1&adults=2&timezoneOffset=3600000&langid=1042&hsrIdentifier=HSR&?1555393986866", headers = headers, data = '').json()
for hotel in r['searchResults']['retailHotelModels']:
print(hotel['retailHotelInfoModel']['hotelName'])