我正在尝试使用GET
请求和requests
库获取页面源。
该页面是:
Page
我可以在浏览器中打开它但我无法在我的代码中获取源代码。
这是到目前为止的相关代码:
headers = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate, sdch",
"Accept-Language": "en-US,en;q=0.5",
"Connection": "keep-alive",
"Host": "www.costacruise.com",
"Upgrade-Insecure-Requests": '1',
"Cookie": '__utma=123683993.896583307.1482317122.1482317122.1482317122.1; __utmz=123683993.1482317122.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); cccountry=http%3A//www.costacruises.com/; ASP.NET_SessionId=ppxhd4xvkniqf5p1ynzazixf; DataNavigation={"occupancy":2,"Cabin":"Interna","Paxtype":"BASIC","Transfer":false,"DeparturePort":false,"Destination":"WEST MEDITERRANEAN","Ship":false,"MonthFrom":"201610","MonthTo":"201610","StartingPrice":false,"FullPrice":false,"Days":"7-9","SpecialDiscount":false,"CostaClub":false,"Kids":false,"Cru":false,"Sended":false,"SendedThanks":false,"UserDest":false,"UserMonth":false}; _gat_UA-3224745-1=1; _gat_UA-22424382-1=1; mbox=session#1482332788075-506442#1482335391|check#true#1482333591; CruiseListSearchParam=%3FPeriod%3D201612_201612%26Page%3D1; _ga=GA1.2.896583307.1482317122; WSS_FullScreenMode=false',
"Refer": "http://www.costacruise.com/usa/cruises_list/201612_201804.html"
}
session.headers.update(headers)
link - http://www.costacruise.com/usa/cruise_details/201711-USD_FS_3_BCN_S_B0H0_BCN_MRS_SVN_BCN-FS03171117.html
price_page = requests.get(link)
soup = BeautifulSoup(price_page, 'lxml')
我收到此错误(每次):
http://www.costacruise.com/usa/cruise_details/201711-USD_FS_3_BCN_S_B0H0_BCN_MRS_SVN_BCN-FS03171117.html
Traceback (most recent call last):
File "/home/fixxxer/PycharmProjects/Costa Cruises/main.py", line 97, in <module>
prices = get_prices(cruise_url)
File "/home/fixxxer/PycharmProjects/Costa Cruises/main.py", line 76, in get_prices
soup = BeautifulSoup(price_page, 'lxml')
File "/usr/lib/python3.5/site-packages/bs4/__init__.py", line 192, in __init__
elif len(markup) <= 256 and (
TypeError: object of type 'Response' has no len()
我错过了什么吗?
答案 0 :(得分:1)
如错误地写,您正在传递Response
对象(price_page
)。相反,传递此对象的文本属性:
price_page = requests.get(link).text
soup = BeautifulSoup(price_page, 'lxml')
或
price_page = requests.get(link)
soup = BeautifulSoup(price_page.text, 'lxml')