如何使用正则表达式匹配多个地址线?

时间:2016-03-08 13:14:21

标签: java regex

鉴于以下示例文本,我可以使用正则表达式匹配每个地址的每一行,并添加标记以了解一个地址何时完成以及下一个地址何时开始?目前,我知道如何匹配每个地址。然后我可以运行第二个正则表达式来挑选各个行,但是可以一次完成这两个步骤吗?

Address:
Address 1 line 1,
Address 1 line 2,
Address 1 line 3

Address:
Address 2 line 1,
Address 2 line 2,
Address 2 line 3,
Address 2 line 4

Address:
Address 3 line 1,
Address 3 line 2

2 个答案:

答案 0 :(得分:1)

这是一个Pattern,其上带有DOTALL标记,可以使用"Address:"字符串作为分隔符,通过多行查找:

// for test
String addresses = "Address:" + System.getProperty("line.separator")
        + "Address 1 line 1," + System.getProperty("line.separator")
        + "Address 1 line 2," + System.getProperty("line.separator")
        + "Address 1 line 3"
        + "Address:" + System.getProperty("line.separator")
        + "Address 2 line 1," + System.getProperty("line.separator")
        + "Address 2 line 2," + System.getProperty("line.separator")
        + "Address 2 line 3";
//                           | look behind for "Address:"
//                           |            | any 1+ character, 
//                           |            | reluctantly quantified
//                           |            |  | lookahead for "Address:"
//                           |            |  | or end of input
//                           |            |  |            | dot can mean
//                           |            |  |            | line separator
Pattern p = Pattern.compile("(?<=Address:).+?(?=Address:|$)", Pattern.DOTALL);
Matcher m = p.matcher(addresses);
// iterating matches within given string, and printing
while (m.find()) {
    System.out.printf("Found: %s%n%n", m.group());
}

<强>输出

Found: 
Address 1 line 1,
Address 1 line 2,
Address 1 line 3

Found: 
Address 2 line 1,
Address 2 line 2,
Address 2 line 3

注意

为了从匹配中排除"Address:"令牌后的行分隔符,您可以使用此精炼模式:

Pattern p = Pattern.compile("(?<=Address:"
    + System.getProperty("line.separator")+").+?(?=Address:"
    + System.getProperty("line.separator")+"|$)", 
    Pattern.DOTALL
);

答案 1 :(得分:0)

如果正则表达式是你想要的......

如果地址中的行数有限(在您的示例4中),您可以通过以下方式获取它们:

Address:\s*?(?:\n(.*),)?(?:\n(.*),)?(?:\n(.*),)?(?:\n(.*),)?(?:\n(.*))

这里文本Address:标记了块的开头,并且抓住了四条线,前三条是可选的。

(你需要全局标志。)

regex101 example