给定像http://savannah.gnu.org/bugs/?23435这样的网址,提取信息的最佳方式是:
1:项目名称:Gnash - GNU Flash播放器
2:问题标题:在Firefox上的菜单上方呈现的Flash内容
答案 0 :(得分:1)
您可以使用BeautifulSoup:
from bs4 import BeautifulSoup
import urllib.request
import re
response = urllib.request.urlopen('http://savannah.gnu.org/bugs/?23435')
html = response.read()
soup = BeautifulSoup(html)
p_title = soup.select('.toptitle')[0].text
# 'Gnash - The GNU Flash player - Bugs: bug #23435, Flash content rendered above menus...'
p_title = p_title.split(' - Bugs:')[0]
i_title = soup.select('.priore')[0].text
# 'bug #23435: Flash content rendered above menus in Firefox'
i_title = re.findall('bug #[0-9]*: (.+)', i_title)[0]
print(p_title)
# 'Gnash - The GNU Flash player'
print(i_title)
# 'Flash content rendered above menus in Firefox'
(Python 3)