鉴于以下示例文本,我可以使用正则表达式匹配每个地址的每一行,并添加标记以了解一个地址何时完成以及下一个地址何时开始?目前,我知道如何匹配每个地址。然后我可以运行第二个正则表达式来挑选各个行,但是可以一次完成这两个步骤吗?
Address:
Address 1 line 1,
Address 1 line 2,
Address 1 line 3
Address:
Address 2 line 1,
Address 2 line 2,
Address 2 line 3,
Address 2 line 4
Address:
Address 3 line 1,
Address 3 line 2
答案 0 :(得分:1)
这是一个Pattern
,其上带有DOTALL
标记,可以使用"Address:"
字符串作为分隔符,通过多行查找:
// for test
String addresses = "Address:" + System.getProperty("line.separator")
+ "Address 1 line 1," + System.getProperty("line.separator")
+ "Address 1 line 2," + System.getProperty("line.separator")
+ "Address 1 line 3"
+ "Address:" + System.getProperty("line.separator")
+ "Address 2 line 1," + System.getProperty("line.separator")
+ "Address 2 line 2," + System.getProperty("line.separator")
+ "Address 2 line 3";
// | look behind for "Address:"
// | | any 1+ character,
// | | reluctantly quantified
// | | | lookahead for "Address:"
// | | | or end of input
// | | | | dot can mean
// | | | | line separator
Pattern p = Pattern.compile("(?<=Address:).+?(?=Address:|$)", Pattern.DOTALL);
Matcher m = p.matcher(addresses);
// iterating matches within given string, and printing
while (m.find()) {
System.out.printf("Found: %s%n%n", m.group());
}
<强>输出强>
Found:
Address 1 line 1,
Address 1 line 2,
Address 1 line 3
Found:
Address 2 line 1,
Address 2 line 2,
Address 2 line 3
注意强>
为了从匹配中排除"Address:"
令牌后的行分隔符,您可以使用此精炼模式:
Pattern p = Pattern.compile("(?<=Address:"
+ System.getProperty("line.separator")+").+?(?=Address:"
+ System.getProperty("line.separator")+"|$)",
Pattern.DOTALL
);
答案 1 :(得分:0)
如果正则表达式是你想要的......
如果地址中的行数有限(在您的示例4中),您可以通过以下方式获取它们:
Address:\s*?(?:\n(.*),)?(?:\n(.*),)?(?:\n(.*),)?(?:\n(.*),)?(?:\n(.*))
这里文本Address:
标记了块的开头,并且抓住了四条线,前三条是可选的。
(你需要全局标志。)