Java Regex提取两个令牌之间的任何字符

时间:2017-06-07 17:07:32

标签: java regex

我正在尝试解析以下文本

### __Description of the report__
Lorem ipsum dolor sit amet,  & mauris elit, blandit a turpis vel nibh, 
consectetuer aliquam. Nec sem. Venenatis quam etiam donec consequat 
sagittis, luctus porttitor odit sollicitudin <> vestibulum ultrices erat,
sed eleifend 
* amet, sollicitudin sit egestas 
* quis eros nulla. Sed donec

### __Notable filters__
* Lorem ipsum dolor sit amet, mauris elit, blandit a turpis vel
* consectetuer aliquam. Nec sem. Venenatis quam etiam donec consequat 
* sagittis, luctus porttitor odit sollicitudin vestibulum ultrices 

我希望捕获### __Description of the report__### __Notable filters__之间的所有文字,这些文字可以是数字字母,也可以是特殊字符的任意组合。

我认为使用### __Description of the report__(.*?)### __Notable filters__会起作用,但它不会返回任何结果。如何在两个标题之间提取文本?

3 个答案:

答案 0 :(得分:1)

你可以使用String的split函数并使用两个头作为正则表达式,与'|'连接操作

这样,第一部分的内容将放在数组的第一个元素中,第二部分的内容将放在数组的第二个元素中。

请检查此代码:

public class Test {
    private String testString = "### __Description of the report__\n" +
"Lorem ipsum dolor sit amet,  & mauris elit, blandit a turpis vel nibh, \n" +
"consectetuer aliquam. Nec sem. Venenatis quam etiam donec consequat \n" +
"sagittis, luctus porttitor odit sollicitudin <> vestibulum ultrices erat,\n" +
"sed eleifend \n" +
"* amet, sollicitudin sit egestas \n" +
"* quis eros nulla. Sed donec\n" +
"\n" +
"### __Notable filters__\n" +
"* Lorem ipsum dolor sit amet, mauris elit, blandit a turpis vel\n" +
"* consectetuer aliquam. Nec sem. Venenatis quam etiam donec consequat \n" +
"* sagittis, luctus porttitor odit sollicitudin vestibulum ultrices ";

    public static void main (String[] args)
    {
        Test t = new Test();
        String[] parts = t.testString.split("### __Description of the report__\n|### __Notable filters__\n");
    }
}

答案 1 :(得分:0)

使用Pattern.DOTALL

Pattern p = Pattern.compile("### __Description of the report__(.*?)### __Notable filters__", Pattern.DOTALL);

Pattern.MULTILINE会将### __Description of the report__### __Notable filters__ EVERY LINE 的开头和结尾相匹配,因此无法使用。 DOTALL会将.与每个字符匹配,包括\n,但未指定Pattern.DOTALL就不会发生这种情况。

要存储它,请执行以下操作:

Matcher m = p.matcher(str); // 'str' is the string with the text
while(m.find())
{
    YourString = m.group(1);
}

稍后,您可以替换这样的额外空格:

YourString = YourString.replaceAll("\\s+", " ");

答案 2 :(得分:0)

由于您选择了表达式,

Trying out your regex似乎没有返回任何内容:

... report__(.*?)### __N ...

.字符与非换行字符匹配,因此您需要在解析前取出字符串中的换行符,或更改表达式以适合输入中的换行符

@CoffeehouseCoder's answer建议使用Pattern.DOTALL,通过允许.匹配换行符来解决此问题

或者,您可以更新正则表达式以匹配字符或换行符like so

... report__((.|\n)*?)### ...