正则表达式中的多组匹配

时间:2016-11-09 07:29:02

标签: java regex

我有一个输入字符串

invalidsufix\nsubadatax\nsufixpart\nsubdata1\nsomerandomn\nsubdata2\nsubdatan\nend

我想只获取它的子数据部分,我试过,

Pattern p = Pattern.compile('(?<=sufixpart).*?(subdata.)+.*?(?=end)',Pattern.DOTALL);

Matcher m = p.matcher(inputString);
while(m.find()){ 
            System.out.println(m.group(1)); 
        }

但我只得到了第一场比赛。如何获取所有子数据,例如[subdata1,subdata2,subdata3]

1 个答案:

答案 0 :(得分:1)

我采用更简单的方法,首先使用start(.*?)end这样的正则表达式获取块,然后使用subdata\S* - 像正则表达式一样从第1组中提取所有匹配项。

请参阅Java demo

String rx = "(?sm)^sufixpart$(.*?)^end$";
String s = "invalidsufix\nsubadatax\nsufixpart\nsubdata1\nsomerandomn\nsubdata2\nsubdatan\nend\ninvalidsufix\nsubadatax\nsufixpart\nsubdata001\nsomerandomn\nsubdata002\nsubdata00n\nend";
Pattern pattern_outer = Pattern.compile(rx);
Pattern pattern_token = Pattern.compile("(?m)^subdata\\S*$");
Matcher matcher = pattern_outer.matcher(s);
List<List<String>> res = new ArrayList<>();
while (matcher.find()){
    List<String> lst = new ArrayList<>();
    if (!matcher.group(1).isEmpty()) {                       // If Group 1 is not empty
        Matcher m = pattern_token.matcher(matcher.group(1)); // Init the second matcher
        while (m.find()) {                       // If a token is found
            lst.add(m.group(0));                 //    add it to the list
        }
    }
    res.add(lst);                                // Add the list to the result list
} 
System.out.println(res); // => [[subdata1, subdata2, subdatan], [subdata001, subdata002, subdata00n]]

另一种方法是使用基于\G的正则表达式:

(?sm)(?:\G(?!\A)|^sufixpart$)(?:(?!^(?:sufixpart|end)$).)*?(subdata\S*)(?=.*?^end$)

请参阅regex demo

<强>解释

  • (?sm) - 启用DOTALL和MULTILINE模式
  • (?:\G(?!\A)|^sufixpart$) - 匹配上一次成功匹配(\G(?!\A))的结尾或其上带有sufixpart文字的整行(^sufixpart$
  • (?:(?!^(?:sufixpart|end)$).)*? - 匹配任何不是sufixpartend整个行的起点的单个字符
  • (subdata\S*) - 第1组匹配subdata和0+非空白
  • (?=.*?^end$) - 任何0+字符后必须有end行。

Java demo

String rx = "(?sm)(\\G(?!\\A)|^sufixpart$)(?:(?!^(?:sufixpart|end)$).)*?(subdata\\S*)(?=.*?^end$)";
String s = "invalidsufix\nsubadatax\nsufixpart\nsubdata1\nsomerandomn\nsubdata2\nsubdatan\nend\ninvalidsufix\nsubadatax\nsufixpart\nsubdata001\nsomerandomn\nsubdata002\nsubdata00n\nend";
Pattern pattern = Pattern.compile(rx);
Matcher matcher = pattern.matcher(s);
List<List<String>> res = new ArrayList<>();
List<String> lst = null;
while (matcher.find()){
    if (!matcher.group(1).isEmpty()) {
        if (lst != null) res.add(lst);
        lst = new ArrayList<>();
        lst.add(matcher.group(2));
    } else lst.add(matcher.group(2)); 
} 
if (lst != null) res.add(lst);
System.out.println(res);