如何使用正则表达式捕获可变长度列表

时间:2016-06-06 16:10:29

标签: java regex

我在不相关的文本中嵌入了列表,如下所示:

Unrelated TextUnrelated TextUnrelated Text

Study Durations(HH:MM) Time 
Total 04:00 
Upright 02:08 
Supine 01:49 
Other  03:10 
More Other 12:34

Unrelated TextUnrelated TextUnrelated TextUnrelated TextUnrelated TextUnrelated TextUnrelated TextUnrelated TextUnrelated TextUnrelated Text

Study Durations(HH:MM) Time 
Total 24:00 
Upright 12:18 
Supine 11:42 
PostPr  n/a 

我想抓住这些小组来获取这个:

示例1输出

Study Durations(HH:MM) Time 
Total 04:00 
Upright 02:08 
Supine 01:49 
Other  03:10 
More Other 12:34

示例2输出

Study Durations(HH:MM) Time 
Total 24:00 
Upright 12:18 
Supine 11:42 
PostPr  n/a

我试过以下正则表达式

Pattern Total_pattern = Pattern.compile("Study Durations\\(HH:MM\\) Time\\s*(?:\\n[A-Za-z]+\\s+(?:\\d+(?::\\d+)?|n/a))",Pattern.DOTALL);

但我只能

Study Durations(HH:MM) Time 
Total 04:00 


Study Durations(HH:MM) Time 
Total 24:00 

1 个答案:

答案 0 :(得分:0)

你可以使用这个正则表达式进行前瞻:

Study Durations\(HH:MM\) Time.*?(?=\n\n|\z)

请务必使用DOTALL修饰符以匹配各行。

Java模式:

Pattern Total_pattern = Pattern.compile(
     "Study Durations\\(HH:MM\\) Time.*?(?=\\n\\n|\\z)", Pattern.DOTALL);

RegEx Demo