Question

我有以下html：

<div class="txt-block">
<h4 class="inline">Aspect Ratio:</h4> 2.35 : 1
</div>

我想从内容中获取值“2.35：1”。但是，当我尝试使用lxml时，它会返回一个空字符串（我可以获得'Aspect Ratio'值，可能是因为它在标签之间是整齐的。）

item.find('div').text

如何获得“2.35：1”值？使用etree.tostring确实可以获得完整的输出。

Answer 1

这称为元素的.tail：

from lxml.html import fromstring

data = """
<div class="txt-block">
<h4 class="inline">Aspect Ratio:</h4> 2.35 : 1
</div>
"""

root = fromstring(data)
print root.xpath('//h4[@class="inline"]')[0].tail

打印2.35 : 1。

作为替代方案，您可以获取h4元素的以下文本兄弟：

root.xpath('//h4[@class="inline"]/following-sibling::text()')[0]

另外，请确保使用lxml.html，因为您正在处理HTML数据。

Answer 2

您还可以使用.text_content()代替.text，它将为您提供元素的全部文字内容（http://lxml.de/lxmlhtml.html） -

>>> item.find('div').text.text_content()
Aspect Ratio: 2.35 : 1

完整的陈述将是：

>>> title_detail.text_content().split('Aspect Ratio: ')[1].strip()
2.35 : 1

使用lxml获取价值

2 个答案: