Question

我想获取使用urllib2打开的网页标题。这样做的最佳方法是什么，解析html并找到我需要的东西（现在只有-tag，但将来可能需要更多）。

为此目的是否有一个很好的解析库？

Answer 1

是的，我会推荐BeautifulSoup

如果您获得了标题，那就是：

soup = BeautifulSoup(html)
myTitle = soup.html.head.title

或

myTitle = soup('title')

取自the documentation

它非常强大，无论它多么混乱都会解析HTML。

Answer 2

尝试Beautiful Soup：

url = 'http://www.example.com'
response = urllib2.urlopen(url)
html = response.read()

soup = BeautifulSoup(html)
title = soup.html.head.title
print title.contents

Answer 3

为什么你们为一项任务导入一个额外的库。没有正则表达式？ urllib不是bs4或mech的第三方请求？使用标准库解析html并匹配字符串然后将'>' '<'与re或whateves分开。

N=(len(html))
for a in html(N):
    if '<title>' in a:
        Title=(str(a))

那是python 2我想，你可以去掉它

Answer 4

使用Beautiful Soup。

html = urllib2.urlopen("...").read()
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(html)
print soup.title.string

Python获取<title> </title>

4 个答案: