Question

我在python中使用xpath来解析html文件中的表。我正在使用这个xpath：

//td//text()

这给我输出两个字符串：

['australia', '$3333.99']

我希望输出：

['australia', '3333.99']

但是我想要删除$ sign一般使用xpath我怎么做？我试过了substring-after，但它不起作用。

这是我尝试的方式：

//td//text()[substring-after(.,'$')]

但我得到了这个输出：

['$3333.99']

结果

中缺少

Australia

Answer 1

除了使用translate()（在其他答案中发布）之外，您还可以使用substring() function并动态确定切片的开头：

In [4]: [item.xpath("substring(., starts-with(., '$') + 1)") for item in root.xpath("//td")]
Out[4]: ['australia', '3333.99']

顺便说一句，这种方法比使用translate()更安全一些，因为在这里我们只在字符串的开头删除一个$字符，如果它存在，但是{{1}将替换您正在提取的每个translate()文本中$的所有匹配项。你可能会得到一些不必要的副作用。

请注意，在任何情况下都必须分两步执行 - td或translate()函数不会应用于每个节点（如substring()所使用的那样，引用：

或者，您可以使用Python和.lstrip()修剪它：

translate(//td//text(), "$", "")

Answer 2

//td//text()[substring-after(.,'$')]

这将评估text()中的['australia', '$3333.99']，以及australia $，它不包含false，这将返回[td.xpath('translate(., "$", "")')for td in tree.xpath("//td")]并且不会显示在结果中

std::string s

使用xpath从text（）中删除某些内容的方法是什么？

2 个答案: