Question

我正在网上搜索机票网站。我的问题是：我正在使用Chrome开发人员来识别我要废弃的HTML对象的类。但是，我的代码找不到它。看起来我没有下载我在Chrome Developer Extension中看到的HTML代码。（检查项目......）

import requests 
from BeautifulSoup import BeautifulSoup

url = 'http://www.momondo.de/flightsearch/?Search=true&TripType=2&SegNo=2&SO0=BOS&SD0=LON&SDP0=07-09-2016&SO1=LON&SD1=BOS&SDP1=12-09-2016&AD=1&TK=ECO&DO=false&NA=false'
req = requests.get(url)
soup = BeautifulSoup(req.content)
x = soup.findAll("span" ,{"class":"value"} )

Answer 1

请尝试以下方法：

from bs4 import BeautifulSoup
import urllib.request

source = urllib.request.urlopen('http://www.momon...e&NA=false').read()
soup = BeautifulSoup(source,'html5lib')
for item in soup.find_all("span", class_="value"):
    print(item.text)

通过这个，你可以用类＆＃34;值＆＃34;来刮掉网页的所有跨度。如果您想要查看整个html元素及其属性而不仅仅是内容，请从.text中删除print(item.text)。

您可能需要使用pip安装html5lib，如果您在执行此操作时遇到问题，请尝试以管理员身份运行CMD（假设您使用的是Windows）。

Answer 2

你也可以试试这个：

for values_in_x in x:

    print(values_in_x.text)

Python Webscrapping

2 个答案: