我有一个.txt文件,最后有一个标记“Home”,我想在Home标记之后取所有文本直到文件结束。 但在少数情况下,我有一种情况,在我想要的文本之后,我有几个空行(超过3个)和一些我不需要的文本。 所以我需要在Home标记后取所有文本的正则表达式,但如果有空行3或更多它将停止。 这是产生问题的.txt文件:
Home
"Empty LINE"
some text that I need some text that I need some text that I need some text that I need some text that I need some text that I need some text that I need some text that I needsome text that I need
"Empty LINE"
"Empty LINE"
"Empty LINE"
"Empty LINE"
"Empty LINE"
some info that I don't need
some info that I don't need
some info that I don't need
some info that I don't need
这是我的代码:
String content = new String(Files.readAllBytes(Paths.get(FILENAME)));
System.out.println(content);
String pattern = "Home\\s(.*$)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(content);
if (m.find( )) {
System.out.println("Found value: " + m.group(1) );
}else {
System.out.println("NO MATCH");
}
答案 0 :(得分:0)
要获取所有文本,直到获得三个空行或文件末尾,请尝试:
Home\\s(.*?)(?=\\n{3}|$)
Home\s
匹配Home
字面值,后跟空格\s
(.*?)
任何角色(非贪婪)(?=\\n{3}|$)
检查后面是否有3个空行\n{3}
或文件末尾$
此外,您需要使用DOTALL
标记,以便点.
也匹配行分隔符。
Pattern.compile(regex, Pattern.DOTALL)
这是一个有效的Java Demo on ideone
答案 1 :(得分:0)
以下正则表达式将会这样做:
"(?s)(?:^|\\R)Home\\R(.*?)(?:\\R{3}|$)"
说明:
(?s)
- 允许稍后指定的.
匹配行终止符(DOTALL
标记)。
(?:^|\\R)
- 匹配文本的开头或行终止符。请注意,使用\R
linebreak matcher以便正确匹配Windows行终止符。
Home\\R
- 匹配文字Home
和行终止符。
(.*?)
- 匹配并capture所需的文字,只要以下匹配模式标识所需文字的结尾(reluctant quantifier)即可结束。
(?:\\R{3}|$)
- 匹配3行终结符或文本结尾。
测试
Path path = Paths.get("path/to/file.txt");
String text = new String(Files.readAllBytes(path)); // assume default character encoding
Matcher m = Pattern.compile("(?s)(?:^|\\R)Home\\R(.*?)(?:\\R{3}|$)").matcher(text);
if (m.find())
System.out.printf("'%s'", m.group(1));
else
System.out.println("** NOT FOUND **");
文本文件是来自问题的文本的复制/粘贴。
输出
'"Empty LINE"
some text that I need some text that I need some text that I need some text that I need some text that I need some text that I need some text that I need some text that I needsome text that I need
"Empty LINE"
"Empty LINE"
"Empty LINE"
"Empty LINE"
"Empty LINE"'