Question

我想在body标签中获取纯文本。

标记：

**simple text 1**
<div>------</div>
<font>-------</font>
**simple text 2**

代码：

foreach (HtmlElement elm in webBrowser1.Document.Body.All)
{
    //get simple text
}

Answer 1

简单地：

string plainText = webBrowser1.Document.Body.InnerText;

Answer 2

我找到了这个简单的方法：

HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(webBrowser1.Document.Body.InnerHtml);
foreach (var elm in htmlDoc.DocumentNode.Descendants())
{
    if (elm.NodeType == HtmlNodeType.Text)
    {
        //simple text is #text
        var innerText=elm.InnerText;
    }  
}

玩得开心。

Answer 3

请尝试以下方式：您可以使用以下技术获取浏览器预览中显示的所有文字。

        string plainText= StripHTML(webBrowser1);// call this way-----

        public string StripHTML(WebBrowser webp)
        {
            try
            {
                Clipboard.Clear();
                webp.Document.ExecCommand("SelectAll", true, null);
                webp.Document.ExecCommand("Copy", false, null);
            }
            catch (Exception ep)
            {
                MessageBox.Show(ep.Message);
            }
            return Clipboard.GetText();            
        }

获取正文中的简单文本

3 个答案: