Question

我需要从该页面获取所有邮轮信息http://www.pocruises.com/find-and-book/cruise-search-results/ 问题是调用该页面的按钮没有href发送请求，当我将其发送到上面的链接时，我得到<Response [200]>。如何获取json或xml中的所有邮轮信息？我是否必须通过条目刮取页面条目？这就是我到目前为止所做的：

page = session.get("http://www.pocruises.com/find-and-book/cruise-search-results/")
print(page)

Answer 1

您可以使用BeautifulSoup获取页面中的所有链接。

# -*- coding: utf-8 -*-

import urllib2
from bs4 import BeautifulSoup

url = 'http://www.pocruises.com/find-and-book/cruise-search-results/'

response = urllib2.urlopen(url).read()

soup = BeautifulSoup(response, 'html.parser')
links = soup.find_all('a')
for link in links:
    print link.get('href')

但如果您需要获取更多信息，我建议您使用Scrapy https://scrapy.org/

当post / head请求返回代码200时，如何抓取页面搜索结果

1 个答案: