HTML到文本,例如Python的BeautifulSoup

时间:2019-06-06 10:33:44

标签: python go beautifulsoup html-to-text

我有一个Python程序,其输出如下:

from bs4 import BeautifulSoup

html = `<h1>This is heading</h1> <p>this is parah <strong>strong</strong> that\'s how it works</p>`

parsed_html = BeautifulSoup(html, 'html.parser')
all_lines = parsed_html.findAll(text=True)
print(all_lines)

# ['This is heading', ' ', 'this is parah ', 'strong', " that's how it works"]

我试图在golang中实现相同的功能,但无法获得所需的输出。到目前为止,我已经尝试过:

import (
    "fmt"
    "strings"
    "github.com/PuerkitoBio/goquery"
)

func parseHTML(body string) string {

    p := strings.NewReader(body)
    doc, _ := goquery.NewDocumentFromReader(p)

    fmt.Println(doc.Text()) 

    // output: This is heading this is parah strong thats how it works

}

1 个答案:

答案 0 :(得分:0)

如果您可以自己实现功能,则看起来很简单。

只需删除所有标签“ ...”,然后继续在标签后附加“ ...”

这将为您提供与python输出完全相同的结果。