我正在尝试解析以下文本
### __Description of the report__
Lorem ipsum dolor sit amet, & mauris elit, blandit a turpis vel nibh,
consectetuer aliquam. Nec sem. Venenatis quam etiam donec consequat
sagittis, luctus porttitor odit sollicitudin <> vestibulum ultrices erat,
sed eleifend
* amet, sollicitudin sit egestas
* quis eros nulla. Sed donec
### __Notable filters__
* Lorem ipsum dolor sit amet, mauris elit, blandit a turpis vel
* consectetuer aliquam. Nec sem. Venenatis quam etiam donec consequat
* sagittis, luctus porttitor odit sollicitudin vestibulum ultrices
我希望捕获### __Description of the report__
和### __Notable filters__
之间的所有文字,这些文字可以是数字字母,也可以是特殊字符的任意组合。
我认为使用### __Description of the report__(.*?)### __Notable filters__
会起作用,但它不会返回任何结果。如何在两个标题之间提取文本?
答案 0 :(得分:1)
你可以使用String的split函数并使用两个头作为正则表达式,与'|'连接操作
这样,第一部分的内容将放在数组的第一个元素中,第二部分的内容将放在数组的第二个元素中。
请检查此代码:
public class Test {
private String testString = "### __Description of the report__\n" +
"Lorem ipsum dolor sit amet, & mauris elit, blandit a turpis vel nibh, \n" +
"consectetuer aliquam. Nec sem. Venenatis quam etiam donec consequat \n" +
"sagittis, luctus porttitor odit sollicitudin <> vestibulum ultrices erat,\n" +
"sed eleifend \n" +
"* amet, sollicitudin sit egestas \n" +
"* quis eros nulla. Sed donec\n" +
"\n" +
"### __Notable filters__\n" +
"* Lorem ipsum dolor sit amet, mauris elit, blandit a turpis vel\n" +
"* consectetuer aliquam. Nec sem. Venenatis quam etiam donec consequat \n" +
"* sagittis, luctus porttitor odit sollicitudin vestibulum ultrices ";
public static void main (String[] args)
{
Test t = new Test();
String[] parts = t.testString.split("### __Description of the report__\n|### __Notable filters__\n");
}
}
答案 1 :(得分:0)
使用Pattern.DOTALL
:
Pattern p = Pattern.compile("### __Description of the report__(.*?)### __Notable filters__", Pattern.DOTALL);
Pattern.MULTILINE
会将### __Description of the report__
和### __Notable filters__
与 EVERY LINE 的开头和结尾相匹配,因此无法使用。 DOTALL
会将.
与每个字符匹配,包括\n
,但未指定Pattern.DOTALL
就不会发生这种情况。
要存储它,请执行以下操作:
Matcher m = p.matcher(str); // 'str' is the string with the text
while(m.find())
{
YourString = m.group(1);
}
稍后,您可以替换这样的额外空格:
YourString = YourString.replaceAll("\\s+", " ");
答案 2 :(得分:0)
Trying out your regex似乎没有返回任何内容:
... report__(.*?)### __N
...
.
字符与非换行字符匹配,因此您需要在解析前取出字符串中的换行符,或更改表达式以适合输入中的换行符
@CoffeehouseCoder's answer建议使用Pattern.DOTALL
,通过允许.
匹配换行符来解决此问题
或者,您可以更新正则表达式以匹配字符或换行符like so:
... report__((.|\n)*?)###
...