Question

我必须在HTML表格中检索文本，在单元格中，文本有时位于<div>内，有时不在。{/ p>

如何在XPath中选择div？

我的实际代码：

stuff = tree.xpath("/html/body/table/tbody/tr/td[5]/div/text()")

通缉伪代码：

stuff = tree.xpath("/html/body/table/tbody/tr/td[5]/div or nothing/text()")

Answer 1

您需要td[5]元素的string value。使用 string() ：

stuff = tree.xpath("string(/html/body/table/tbody/tr/td[5])")

这将返回td[5]下方没有标记的文字。

如果您还希望在末端修剪空白并在内部缩小，则还可以通过 normalize-space() 间接获取元素的字符串值suggested by splash58 in the comments。 / p>