Question

我的项目有问题。我从一个网站获得HTML，然后我想用xpath选择SelectSingleNode，这是内容html：

<html>
<body>
<div>
<h3 class="bp">Groups you are in</h3>
</div> </body> </html>

这是我的代码：

var xpath = string.Format("//html/body/div/h3[.= '{0}'","groups you are in")
var header = BuildDom("{this is link website i get html}").SelectSingleNode(xpath);

这是我的班级BuildDom：

HtmlNode BuildDom(string url)
{
    string htmlContent = _http.DownloadContent(url);
    return HtmlHelper.BuildDom(htmlContent);
}

看看：

var header = BuildDom("{this is link website i get html}").SelectSingleNode(xpath);

标头将为null;因为在html标签h3中：你在

的群组

在我的xpath中：“你在”

组

我如何使用xpath“你在哪个组”？从html中忽略标记h3中的个案。我不能将我的xpath更改为“你在的群组”，因为在html中，有时h3内容为“你所在的群组”，有时内容为“你在的群组”或“你在的群体”

Answer 1

您可以尝试使用matches()。标记"i"允许忽略案例

//html/body/div/h3[matches(., "groups you are in", "i")]

Answer 2

一种解决方案是，您可以匹配较低或大写。

"//html/body/div/h3[lower-case(.) = 'groups you are in']"

c＃SelectNodes with XPath忽略标记HTML内容中的个案

2 个答案: