我有HTML代码段:
<p>Rendered on a website,
this will all be on one line.</p>
<p>This would be on another line.</p>
和C#代码:
HtmlDocument doc = new HtmlDocument();
doc.Load(path);
string text = HtmlEntity.DeEntitize(doc.DocumentNode.InnerText);
现在“text”将在3行上:
Rendered on a website,
this will all be on one line.
This would be on another line.
但我想:
Rendered on a website, this will all be on one line.
This would be on another line.
这可以使用HtmlAgilityPack吗?
答案 0 :(得分:0)
您可以执行类似
的操作string html = @"<p>Rendered on a website,
this will all be on one line.</p>
<p>This would be on another line.</p>";
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
string text = HtmlEntity.DeEntitize(doc.DocumentNode.InnerText);
Regex r = new Regex(@"\s+");
var sentences = text.Replace(",\r\n", ", ").Split(new string[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries);
var finalText = string.Join("\r\n", sentences.Select(s => r.Replace(s, " ").Trim()));
Console.WriteLine(text + "\n");
Console.WriteLine(finalText + "\n");
你真的不需要正则表达式,我只是用它来摆脱我在html
变量中硬编码html所添加的表格/间距字符。
输出: