Question

我是使用python进行Web编程的新手。目前我正致力于从网站上“抓取”一小段信息。网站：http://www.airport-data.com/airport/HJO/#location 提取/报废的信息：“提升”（参见位置和QuickFacts下）

到目前为止我的代码：

from BeautifulSoup import BeautifulSoup
url2 = urllib2.urlopen('http://www.airport-data.com/airport/HJO/#location').read()
soup = BeautifulSoup(url2)
print soup #I did this just to see the content.

我试着在网上阅读并看了一些以前的帖子，但未能绕过我的脑袋。有关如何从网络链接中提取/删除“提升”的任何建议？谢谢

Answer 1

首先，根据BeautifulSoup project documentation：

Beautiful Soup 3已被Beautiful Soup 4取代。

Beautiful Soup 3仅适用于Python 2.x，但也适用于Beautiful Soup 4   适用于Python 3.x.美丽的汤4更快，有更多的功能，   并与第三方解析器（如lxml和html5lib）配合使用。你应该   将Beautiful Soup 4用于所有新项目。

安装BeautifulSoup 4-th version：

pip install beautifulSoup4

然后，想法是找到包含Elevation:文字的标记并获取the next sibling：

import urllib2
from bs4 import BeautifulSoup

url2 = urllib2.urlopen('http://www.airport-data.com/airport/HJO/#location')
soup = BeautifulSoup(url2)

print soup.find('td', class_='tc1', text='Elevation:').next_sibling.text

打印：

240 ft / 73.15 m (Estimated)

从网页刮痧 - 蟒蛇

1 个答案: