lxml.html的href属性

时间:2014-12-07 16:54:05

标签: python-3.4 lxml.html

根据此answer

>>> from lxml.html import fromstring
>>> s = """<input type="hidden" name="question" value="1234">"""
>>> doc = fromstring(s)
>>> doc.value
'1234'
>>> doc.name
'question'

我试图从这段代码中获取链接和文本:

from lxml.html import fromstring
s = '<a href="http://a.com" rel="bookmark">bla bla bla</a>'
doc = fromstring(s)
print (doc.href)
print (doc.text_content())

它提供AttributeError:'HtmlElement' object has no attribute 'href'

我是lxml的新手。实际上是什么问题?

如何将链接(a.com)和文本(bla bla bla)作为此代码中的字符串?

1 个答案:

答案 0 :(得分:5)

此代码适用于我

from lxml.html import document_fromstring
doc = document_fromstring('<a href="http://a.com" rel="bookmark">bla bla bla</a>')
print (doc.xpath("//a")[0].get("href"))
print (doc.text_content())

输出:

http://a.com
bla bla bla