我在不相关的文本中嵌入了列表,如下所示:
Unrelated TextUnrelated TextUnrelated Text
Study Durations(HH:MM) Time
Total 04:00
Upright 02:08
Supine 01:49
Other 03:10
More Other 12:34
Unrelated TextUnrelated TextUnrelated TextUnrelated TextUnrelated TextUnrelated TextUnrelated TextUnrelated TextUnrelated TextUnrelated Text
Study Durations(HH:MM) Time
Total 24:00
Upright 12:18
Supine 11:42
PostPr n/a
我想抓住这些小组来获取这个:
示例1输出
Study Durations(HH:MM) Time
Total 04:00
Upright 02:08
Supine 01:49
Other 03:10
More Other 12:34
示例2输出
Study Durations(HH:MM) Time
Total 24:00
Upright 12:18
Supine 11:42
PostPr n/a
我试过以下正则表达式
Pattern Total_pattern = Pattern.compile("Study Durations\\(HH:MM\\) Time\\s*(?:\\n[A-Za-z]+\\s+(?:\\d+(?::\\d+)?|n/a))",Pattern.DOTALL);
但我只能
Study Durations(HH:MM) Time
Total 04:00
Study Durations(HH:MM) Time
Total 24:00
答案 0 :(得分:0)
你可以使用这个正则表达式进行前瞻:
Study Durations\(HH:MM\) Time.*?(?=\n\n|\z)
请务必使用DOTALL
修饰符以匹配各行。
Java模式:
Pattern Total_pattern = Pattern.compile(
"Study Durations\\(HH:MM\\) Time.*?(?=\\n\\n|\\z)", Pattern.DOTALL);