Question

我正在尝试捕获XML标记内的文本，例如...和字符串内部的内容，如“[[A]]”，它们将位于XML标记内。到目前为止，我的模式如下：

    Pattern titleText = Pattern.compile("<title>([A-Z])</title>");
    Pattern extractLink = Pattern.compile("(\[\[([A-Z])\]\])");

我在第二个模式上遇到错误，这是因为\ s。但是，我不确定如何让Regex知道我想要转义[和] s，以便它捕获里面的文本。

我想要捕获的输入示例是：

<title>random text [[A]] more random text [[B]] ...</title>

[[A]]和[[B]]可能会发生多次，我正试图找到所有这些。

非常感谢任何帮助/建议。

Answer 1

您无法在Java中提取任意次数的正则表达式组，而无需在模式中指定每个正则表达式组。但是，这里有一个替代解决方案，它将String放在您想要匹配的括号项目上：

Pattern titleText = Pattern.compile("<title>(.*?)</title>");
String input = "<title>random text [[A]] more random text [[B]] ...</title>";
String text = "";

Matcher m = titleText.matcher(input);
if (m.find( )) {
    text = m.group(1);
}

String[] parts = text.split("\\[\\[");

for (int i=1; i < parts.length; ++i) {
    int index = parts[i].indexOf("]]");
    String match = parts[i].substring(0, index);
    System.out.println("Found a match: " + match);
}

<强>输出：

Found a match: A
Found a match: B

Answer 2

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class TestTag {

    public static void main(String[] args) {
        String INPUT = "<title>random text [[ABBA]] more random text [[B]] ...</title>";
        String REGEX = "(\\[\\[\\S*]])";

        Pattern p = Pattern.compile(REGEX);
        Matcher m = p.matcher(INPUT);

        while (m.find()) {
        System.out.println(" data: "
            + INPUT.substring(m.start() + 2, m.end() - 2));
        }

    }
}

Java Regex，捕获“[...]”内的项目

2 个答案: