所以我有一个我想用C#修改的HTML片段。
<div>
This is a specialSearchWord that I want to link to
<img src="anImage.jpg" />
<a href="foo.htm">A hyperlink</a>
Some more text and that specialSearchWord again.
</div>
我希望将其转换为:
<div>
This is a <a class="special" href="http://mysite.com/search/specialSearchWord">specialSearchWord</a> that I want to link to
<img src="anImage.jpg" />
<a href="foo.htm">A hyperlink</a>
Some more text and that <a class="special" href="http://mysite.com/search/specialSearchWord">specialSearchWord</a> again.
</div>
我将根据这里的许多建议使用HTML Agility Pack,但我不知道我要去哪里。特别是,
答案 0 :(得分:20)
InnerHtml
属性(或文本节点上的Text
)或使用例如修改dom树来修改dom树。 AppendChild
,PrependChild
等。HtmlDocument.DocumentNode.OuterHtml
属性或使用HtmlDocument.Save
方法(我个人更喜欢第二种选择)。关于解析,我在div
中选择包含搜索词的文本节点,然后使用string.Replace
方法替换它:
var doc = new HtmlDocument();
doc.LoadHtml(html);
var textNodes = doc.DocumentNode.SelectNodes("/div/text()[contains(.,'specialSearchWord')]");
if (textNodes != null)
foreach (HtmlTextNode node in textNodes)
node.Text = node.Text.Replace("specialSearchWord", "<a class='special' href='http://mysite.com/search/specialSearchWord'>specialSearchWord</a>");
将结果保存为字符串:
string result = null;
using (StringWriter writer = new StringWriter())
{
doc.Save(writer);
result = writer.ToString();
}
答案 1 :(得分:1)
数目:
请注意,您的Xpath表达式可能需要更复杂才能找到所需的div。
HtmlDocument doc = new HtmlDocument();
doc.Load(yourHtmlFile);
HtmlNode divNode = doc.DocumentNode.SelectSingleNode("//div[2]");
string newDiv = Regex.Replace(divNode.InnerHtml, @"specialSearchWord",
"<a class='special' href='http://etc'>specialSearchWord</a>");
divNode.InnerHtml = newDiv;
Console.WriteLine(doc.DocumentNode.OuterHtml);