Question

允许的标签存储在列表中。我需要一个简单的验证方法，如果测试字符串包含任何“不允许的标记”，则返回false。

我需要smthg比

更好

static final Pattern TAG_PATTERN = Pattern.compile("(?<=</?)([^ >/]+)")
static final ArrayList<String> allowedTags = ["p", "div", "b", "strong" ,"ul" ,"li", "span", "style", "a", "table", "tr", "th","td"]

static Boolean parseTag(String str){
    Matcher m = TAG_PATTERN.matcher(str);
    while(m.find()) {
        String tag = m.group(1);
        if (!allowedTags.contains(tag)){
            return false
        }
    }
    return true
}

Answer 1

假设您正在处理可能不正确的html，tagsoup parser可能是一个选项，因为它可以解析格式错误（即＆＃34;真实＆＃34;在很多HTML的意义上网上的来源不是很好）html / xml源码。

以下代码使用tagsoup和groovy XmlSlurper来解析输入并验证有效标记名称列表：

@Grab('org.ccil.cowan.tagsoup:tagsoup:1.2.1')
import org.ccil.cowan.tagsoup.Parser

def html = '''
<html>
  <body>
    <span>some content</span>
  </body>
  <not-allowed>some content</not-allowed>
</html>
'''

def document = new XmlSlurper(new Parser()).parseText(html)
def validTags = ['html', 'body', 'span']

def isValid = document.'**'.every { tag ->
  println "${tag.name()} is ${tag.name()?.toLowerCase() in validTags ? '' : 'not '} allowed"
  tag.name()?.toLowerCase() in validTags
}

println "\nVALID: $isValid"

产生：

html is  allowed
body is  allowed
span is  allowed
not-allowed is not  allowed

VALID: false

代码使用groovy库中的XmlSlurper。 **运算符对xml结构中的所有标记执行depth first搜索，如果任何every表达式返回false，则tag.name()?.toLowerCase() in validTags调用将返回false。

编辑：假设您已经完成了html，您仍然可以使用上述内容，只需将解析行替换为：

def document = new XmlSlurper().parseText(html)

并省略脚本顶部的@Grab和import指令。

如何检查字符串是否包含（if）仅允许的html标签？

1 个答案: