存储在数据库中的文本还包括CSS样式。
<p>ABC | Min. XYZ
<style type="text/css"><!--td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}-->
</style>
<span data-sheets-userformat="{"2":3011,"3":{"1":0},"4":[null,2,16777215],"9":1,"10":1,"11":4,"12":0,"14":[null,2,0]}" data-sheets-value="{"1":2,"2":"PQR"}" style="font-size: 10pt; font-family: Arial; color: rgb(0, 0, 0); text-align: center;">PQR</span></p>
要摆脱&nbsp,我使用了html.Unescape(),它的工作原理非常好。
从数据库中获取后,我想以以下格式显示它:ABC | Min. XYZ PQR
但是实际结果(使用html.Unescape()
之后)是:
ABC | Min. XYZ
<style type="text/css">
<!--td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}-->
</style>
<span data-sheets-userformat="{"2":3011,"3":{"1":0},"4":[null,2,16777215],"9":1,"10":1,"11":4,"12":0,"14":[null,2,0]}" data-sheets-value="{"1":2,"2":"PQR"}" style="font-size: 10pt; font-family: Arial; color: rgb(0, 0, 0); text-align: center;">PQR</span></p>
答案 0 :(得分:0)
这似乎很简单,但是需要您做三件事:
<p>
和<style type="text/css">
这样的Unescape HTML实体U+00A0
)您可以使用github.com/microcosm-cc/bluemonday
,html
和strings
通过以下操作做到这一点:
// Your input text
input := `<p>ABC | Min. XYZ
<style type="text/css"><!--td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}-->
</style>
<span data-sheets-userformat="{"2":3011,"3":{"1":0},"4":[null,2,16777215],"9":1,"10":1,"11":4,"12":0,"14":[null,2,0]}" data-sheets-value="{"1":2,"2":"PQR"}" style="font-size: 10pt; font-family: Arial; color: rgb(0, 0, 0); text-align: center;">PQR</span></p>`
// Strip all HTML tags
p := bluemonday.StrictPolicy()
output := p.Sanitize(input)
// Unescape HTML entities
output = html.UnescapeString(output)
// Condense whitespace
output = strings.Join(strings.Fields(strings.TrimSpace(output)), " ")
输出现在为ABC | Min. XYZ PQR
对于最后一步,使用strings.Fields
看起来比使用正则表达式更干净,因为\s
不包含不间断空格(U+00A0
),因此需要满足以下条件:
// Leading and trailing spaces
output = regexp.MustCompile(`^[\s\p{Zs}]+|[\s\p{Zs}]+$`).ReplaceAllString(output, "")
// middle spaces
output = regexp.MustCompile(`[\s\p{Zs}]{2,}`).ReplaceAllString(output, " ")
在此处查看有关匹配空格的更多信息:How to remove redundant spaces/whitespace from a string in Golang?
最后,您可以在github.com/grokify/gotilla/html/htmlutil
var bluemondayStrictPolicy = bluemonday.StrictPolicy()
func HTMLToTextCondensed(s string) string {
return strings.Join(
strings.Fields(
strings.TrimSpace(
html.UnescapeString(
bluemondayStrictPolicy.Sanitize(s),
),
)),
" ",
)
}