如何使用bs4导航html?

时间:2017-07-01 02:47:30

标签: python html python-3.x

今天下午才开始学习python。试图刮掉kubuntu.org(简单的html)的rss feed作为练习,但我无法弄清楚如何导航html并只打印feedTitle:

#!/usr/bin/python3.5
import bs4 as bs
import urllib.request

site = urllib.request.urlopen('https://kubuntu.org/feed').read()
soup = bs.BeautifulSoup(site, 'lxml')

for title in soup.find_all('item'):
    print(title.text)

编辑:

title添加到find_all行有点可以提供我想要的内容,但仍有大量数据也使用了标题标记。

#!/usr/bin/python3.5
import bs4 as bs
import urllib.request

site = urllib.request.urlopen('https://kubuntu.org/feed').read()
soup = bs.BeautifulSoup(site, 'lxml')

for title in soup.find_all(['item', 'title']):
    print(title.text)

1 个答案:

答案 0 :(得分:0)

只需将title标记作为item的子节点访问:

...
for item in soup.find_all('item'):
    print(item.title.text)

输出:

Kubuntu Artful Aardvark (17.10) Alpha 1
Latest round of backports PPA updates include Plasma 5.10.2 for Zesty 17.04
Plasma 5.10.1 now in Zesty backports
17.10 Wallpaper Contest deadline for submissions soon
Plasma bugfix releases, Frameworks, & selected app updates now available in backports PPA for Zesty and Xenial
17.10 Wallpaper Contest! Call for artists
KDE PIM update now available for Zesty Zapus 17.04
KDE PIM update for Zesty available for testers
Kubuntu 17.04 Released!
Kubuntu 17.04 Release Candidate – call for testers