鉴于我有以下xml:
<div id="Main">
<div class="quote">
This is a quote and I don't want this text
</div>
<p>
This is content.
</p>
<p>
This is also content and I want both of them
</p>
</div>
是否有“XPath”来帮助我选择 div#Main 的内部文字作为单个节点,但必须排除任何 div.quote 。
我只想要文字:“这是内容。这也是内容,我想要他们两个”
提前致谢
以下是测试XPath的代码,我使用.NET和HtmlAgilityPack,但我相信xPath应该适用于任何语言
[Test]
public void TestSelectNode()
{
// Arrange
var html = "<div id=\"Main\"><div class=\"quote\">This is a quote and I don't want this text</div><p>This is content.</p><p>This is also content and I want both of them</p></div>";
var xPath = "//div/*[not(self::div and @class=\"quote\")]/text()";
var doc = new HtmlDocument();
doc.LoadHtml(html);
// Action
var node = doc.DocumentNode.SelectSingleNode(xPath);
// Assert
Assert.AreEqual("This is content.This is also content and I want both of them", node.InnerText);
}
测试失败显然是因为xPath仍然不正确。
Test 'XPathExperiments/TestSelectNode' failed:
Expected values to be equal.
Expected Value : "This is content.This is also content and I want both of them"
Actual Value : "This is content."
答案 0 :(得分:2)
我认为没有XPath可以将此作为单个节点,因为您尝试获取的值不是单个节点。有没有理由你不能这样做?
StringBuilder sb = new StringBuilder();
// Action
var nodes = doc.DocumentNode.SelectNodes(xPath);
foreach(var node in nodes)
{
sb.Append(node.InnerText);
}
// Assert
Assert.AreEqual("This is content.This is also content and I want both of them",
sb.ToString());
答案 1 :(得分:0)
你想要div的任何孩子的文本,而不是div引用的类:
div/*[not(self::div and @class="quote")]/text()
答案 2 :(得分:0)
没有XPath可以为您提供组合的字符串值,因为XPath仅选择节点对象,即使它们是文本节点,也仅选择节点对象。
看到您在有问题的<p>
中有<div>
个节点时,我会使用
div[@id='Main']/p/text()
会在<p>
中的<div id="Main">
元素中生成文本节点列表。在这些内容之间进行迭代并串联文本内容应该很简单。