Question

我正在尝试从特定术语后的大量行（最多100到130条）中捕获一组行。

这是我的代码。

String inp = "Welcome!\n"
                +" Welcome to the Apache ActiveMQ Console of localhost (ID:InternetXXX022-45298-5447895412354475-2:9) \n"
                +"  You can find more information about Apache ActiveMQ on the Apache ActiveMQ Site \n"
                +" Broker\n"
                +" Name localhost\n"
                +" Version  5.13.3\n"
                +" ID   ID:InternetXXX022-45298-5447895412354475-2:9\n"
                +" Uptime   14 days 14 hours\n"
                +" Store percent used   19\n"
                +" Memory percent used  0\n"
                +" Temp percent used    0\n"
                + "Queue Views\n"
                + "Graph\n"
                + "Topic Views\n"
                + "  \n"
                + "Subscribers Views\n";
        Pattern rgx = Pattern.compile("(?<=Broker)\n((?:.*\n){1,7})", Pattern.DOTALL);
        Matcher mtch = rgx.matcher(inp);
        if (mtch.find()) {
            String result = mtch.group();
            System.out.println(result);
        }

我想从上述inp中的所有行中捕获下面的行。

Name    localhost\n
Version 5.13.3\n
ID  ID:InternetXXX022-45298-5447895412354475-2:9\n
Uptime  14 days 14 hours\n
Store percent used  19\n
Memory percent used 0\n
Temp percent used   0\n

但是我的代码给了我“经纪人”之后的所有行。我可以知道这是怎么回事吗？

第二，我想理解，？：表示不捕获组，但是为什么我的regex（（？：。* \ n））能够在Broker之后捕获行？

Answer 1

您还必须删除Pattern.DOTALL，因为它也会使.与换行符匹配，并且您可以使用.*来抓取整个文本，然后就不需要限定符了。

此外，您的实际数据似乎包含CRLF行尾，因此使用\R而不是\n来匹配换行符更为方便。否则，您可以在模式内部使用Pattern.UNIX_LINES modifier（或其等效的嵌入标志(?d)），然后可以保持模式不变（因为只有\n，LF，被认为是换行符，.将与回车符（CR）相匹配。

另外，我建议trim命名result。

使用

Pattern rgx = Pattern.compile("(?<=Broker)\\R((?:.*\\R){1,7})");
// Or, 
// Pattern rgx = Pattern.compile("(?d)(?<=Broker)\n((?:.*\n){1,7})");
Matcher mtch = rgx.matcher(inp);
if (mtch.find()) {
    String result = mtch.group();
    System.out.println(result.trim());
}

请参见Java demo online。

正则表达式以选择特定的多行

1 个答案: