无法获取页面源

时间:2016-12-21 15:39:40

标签: python python-3.x beautifulsoup python-requests python-3.5

我正在尝试使用GET请求和requests库获取页面源。 该页面是: Page 我可以在浏览器中打开它但我无法在我的代码中获取源代码。 这是到目前为止的相关代码:

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Encoding": "gzip, deflate, sdch",
    "Accept-Language": "en-US,en;q=0.5",
    "Connection": "keep-alive",
    "Host": "www.costacruise.com",
    "Upgrade-Insecure-Requests": '1',
    "Cookie": '__utma=123683993.896583307.1482317122.1482317122.1482317122.1; __utmz=123683993.1482317122.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); cccountry=http%3A//www.costacruises.com/; ASP.NET_SessionId=ppxhd4xvkniqf5p1ynzazixf; DataNavigation={"occupancy":2,"Cabin":"Interna","Paxtype":"BASIC","Transfer":false,"DeparturePort":false,"Destination":"WEST MEDITERRANEAN","Ship":false,"MonthFrom":"201610","MonthTo":"201610","StartingPrice":false,"FullPrice":false,"Days":"7-9","SpecialDiscount":false,"CostaClub":false,"Kids":false,"Cru":false,"Sended":false,"SendedThanks":false,"UserDest":false,"UserMonth":false}; _gat_UA-3224745-1=1; _gat_UA-22424382-1=1; mbox=session#1482332788075-506442#1482335391|check#true#1482333591; CruiseListSearchParam=%3FPeriod%3D201612_201612%26Page%3D1; _ga=GA1.2.896583307.1482317122; WSS_FullScreenMode=false',
    "Refer": "http://www.costacruise.com/usa/cruises_list/201612_201804.html"
}
session.headers.update(headers)
link - http://www.costacruise.com/usa/cruise_details/201711-USD_FS_3_BCN_S_B0H0_BCN_MRS_SVN_BCN-FS03171117.html
price_page = requests.get(link)
soup = BeautifulSoup(price_page, 'lxml')

我收到此错误(每次):

http://www.costacruise.com/usa/cruise_details/201711-USD_FS_3_BCN_S_B0H0_BCN_MRS_SVN_BCN-FS03171117.html
Traceback (most recent call last):
  File "/home/fixxxer/PycharmProjects/Costa Cruises/main.py", line 97, in <module>
    prices = get_prices(cruise_url)
  File "/home/fixxxer/PycharmProjects/Costa Cruises/main.py", line 76, in get_prices
    soup = BeautifulSoup(price_page, 'lxml')
  File "/usr/lib/python3.5/site-packages/bs4/__init__.py", line 192, in __init__
    elif len(markup) <= 256 and (
TypeError: object of type 'Response' has no len()

我错过了什么吗?

1 个答案:

答案 0 :(得分:1)

如错误地写,您正在传递Response对象(price_page)。相反,传递此对象的文本属性:

price_page = requests.get(link).text
soup = BeautifulSoup(price_page, 'lxml')

price_page = requests.get(link)
soup = BeautifulSoup(price_page.text, 'lxml')