从包含内容的String中删除HTML标记

时间:2015-07-22 11:40:33

标签: java regex

我有string = "195121<span class="up">+432</span>"。我需要regEx来删除带有内容的标签(结果string = "195121"

2 个答案:

答案 0 :(得分:2)

您可以尝试使用以下基于正则表达式的捕获组。

string.replaceAll("(?s)<(\\w+)\\b[^<>]*>.*?</\\1>", "");

答案 1 :(得分:1)

为我工作的主要正则表达式如下;它将删除具有给定标签名称的所有内容。

"(?is)<your_tag_name[^>]+>.*?<\\/your_tag_name>"

我这样管理。希望对别人有帮助。

var data = "<p>Dhaka is the capital city of Bangladesh " +
    "and many palaces and mosques remain. This is" +
    " fast-growing modern metropolis.</p>\\r\\n<p>&lt;flightnode to=\"CXB\"&gt;&lt;/flightnode&gt;</p>"

首先将&lt; &gt; 替换为<和>

// This replacement not needed if it's already been there
data = data.replace("&lt;", "<").replace("&gt;", ">")

然后打印并检查。

println("\n\n $data")

> //output //-> <p>Dhaka is the capital city of Bangladesh and many
> palaces and mosques remain. This is fast-growing modern
> metropolis.</p><p><flightnode to="CXB"></flightnode></p>

设置要删除的标签数组及其元素

val tag = arrayOf("flightnode", "hotelnode ", "packagenode")

然后遍历您的字符串

for (value in tag) {
    val patternString = "(?is)<$value[^>]+>.*?<\\/$value>"
    val pattern = compile(patternString)
    val matcher = pattern.matcher(data)
    println("\n\n" + matcher.find())
    data = matcher.replaceAll("")
}

打印以进行检查。

println("\n\n" + data)

> // output // -> <p>Dhaka is the capital city of Bangladesh and many
> palaces and mosques remain. This is fast-growing modern
> metropolis.</p>\r\n<p></p>

感谢我的前同事@masud-bappy创建正则表达式。