我在Python上使用beautifulsoup 有没有办法获取属性名称的值如下:
name = title value =这是标题
name = link value = ... / style.css
soup.html.head =
<meta content="all" name="audience"/>
<meta content="2006-2013 webrazzi.com." name="copyright"/>
<title> This is title</title>
<link href=".../style.css" media="screen" rel="stylesheet" type="text/css"/>
答案 0 :(得分:3)
使用.text
或.string
属性获取元素的文本内容。
使用.get('attrname')
或['attrname']
获取属性值。
html = '''
<head>
<meta content="all" name="audience"/>
<meta content="2006-2013 webrazzi.com." name="copyright"/>
<title> This is title</title>
<link href=".../style.css" media="screen" rel="stylesheet" type="text/css"/>
</head>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
print('name={} value={}'.format('title', soup.title.text)) # <----
print('name={} value={}'.format('link', soup.link['href'])) # <----
输出:
name=title value= This is title
name=link value=.../style.css
更新根据OP的评论:
def get_text(el): return el.text
def get_href(el): return el['href']
# map tag names to functions (what to retrieve from the tag)
what_todo = {
'title': get_text,
'link': get_href,
}
for el in soup.select('head *'): # To retrieve all children inside `head`
f = what_todo.get(el.name)
if not f: # skip non-title, non-link tags.
continue
print('name={} value={}'.format(el.name, f(el)))
输出:与上述相同