Python HTML解析,获取标签名称及其值

时间:2014-02-25 07:59:00

标签: python python-2.7 beautifulsoup

我在Python上使用beautifulsoup 有没有办法获取属性名称的值如下:

name = title value =这是标题

name = link value = ... / style.css

soup.html.head =

<meta content="all" name="audience"/>
<meta content="2006-2013 webrazzi.com." name="copyright"/>
<title> This is title</title>
<link href=".../style.css" media="screen" rel="stylesheet" type="text/css"/>

1 个答案:

答案 0 :(得分:3)

使用.text.string属性获取元素的文本内容。

使用.get('attrname')['attrname']获取属性值。

html = '''
<head>
    <meta content="all" name="audience"/>
    <meta content="2006-2013 webrazzi.com." name="copyright"/>
    <title> This is title</title>
    <link href=".../style.css" media="screen" rel="stylesheet" type="text/css"/>
</head>
'''

from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
print('name={} value={}'.format('title', soup.title.text))  # <----
print('name={} value={}'.format('link', soup.link['href'])) # <----

输出:

name=title value= This is title
name=link value=.../style.css

更新根据OP的评论:

def get_text(el): return el.text
def get_href(el): return el['href']

# map tag names to functions (what to retrieve from the tag)
what_todo = {
    'title': get_text,
    'link': get_href,
}
for el in soup.select('head *'): # To retrieve all children inside `head`
    f = what_todo.get(el.name)
    if not f: # skip non-title, non-link tags.
        continue
    print('name={} value={}'.format(el.name, f(el)))

输出:与上述相同