Question

winform中htmlEditor1.Html的输出是：

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<META content="text/html; charset=unicode" http-equiv=Content-Type>.......

我是新手。我不知道上面的格式是什么。

但是我需要以下格式的输出（纯文本或html），以便我可以将它保存在DB中的表中：

"some text checking\r\n"

有任何建议如何获得它？

Answer 1

显然，您的第三方控件不支持检索除原始HTML之外的任何内容。

如果您需要解析此问题以检索特定元素＆＃39;值，然后我建议使用HTML Agility Pack。您可以使用NuGet包管理器将其添加到您的解决方案中（在解决方案资源管理器中右键单击您的解决方案，选择＆＃39;管理NuGet包...＆＃39;，然后搜索并添加HtmlAgilityPack包）

完成此操作后，您可以在代码中处理HTML。例如，如果要检索每个段落中的文本，可以执行以下操作：

// Create an HTML Document to parse
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
// Load in the third party control's HTML output
doc.LoadHtml(htmlEditor1.Html);
// Retrieve the paragraph (p) nodes of the document
List<HtmlAgilityPack.HtmlNode> paragraphNodes = doc.DocumentNode.DescendantNodes()
    .Where(node => node.Name == "p")
    .ToList();

// Process each of the paragraph nodes in turn
foreach (var node in paragraphNodes)
{
    // Output the paragraph text
    // TODO: save the text in the database...
    Console.WriteLine(node.InnerText);
}

注意：如果HTML确实表示Word文档，则节点可能具有与上面不同的名称，可能具有名称空间前缀和冒号。您需要使用node.Name == "p"更改上述示例中的node.Name == "<prefix>:<nodename>"代码才能处理这些代码，例如： node.Name == "w:p"。

从winforms中获取Control中的纯文本

1 个答案: