Question

匹配模式foo，但如果它出现在模式bar之后，则不会出现。基本上给出了一个字符串，我正在“尝试”匹配任何字符串<的开始标记>，如果它在关闭标记</任何字符串>之后不会发生匹配。

注意：我正在“尝试”这样的方法来解决，这可能不是解决方案的实际路径。如果你能帮助解决当前的问题，我会很高兴。

所以它应该匹配：
<h1>中的<h1> <h1>中的<h1> abc </h1> <abc>中的<abc>something</cde><efg> <abc>

中的something<abc>something

不应该匹配任何内容：
</h1>
</abc> one two three <abc> five six <abc>
one two three </abc> five six <abc>

Answer 1

最简单的解决方案是将部分工作外包给java regex API。使用正则表达式，我们只能匹配<[^>]*>，即任何html标记。然后我们可以使用Matcher.region()来限制匹配任何</之前的字符串。

以下是代码：

    // example data
    String[] inputLines = {
            "<h1>",
            "<h1> abc </h1>",
            "<abc>something</cde><efg>",
            "something<abc>something",
            "",
            "</h1>",
            "</abc> one two three <abc> five six <abc>",
            "one two three </abc> five six <abc>"
    };

    // the pattern for any html tag
    Pattern pattern = Pattern.compile("<[^>]*>");

    for (String line : inputLines) {
        Matcher matcher = pattern.matcher(line);
        // the index that we must not search after
        int undesiredStart = line.indexOf("</");

        //  undesiredStart == -1 ? line.length() : undesiredStart handles the undesired not found case. In that case the region end must be the length of the string
        matcher.region(0, undesiredStart == -1 ? line.length() : undesiredStart);

        // this is the idiom to iterate through the matches
        while (matcher.find()) {
            System.out.println(matcher.group());
        }
    }

正则表达式：匹配模式`foo`但如果它出现在模式`bar`之后则不会出现

1 个答案: