用于检测标记的正则表达式

时间:2015-05-14 18:57:21

标签: java eclipse

我正在尝试检测此文件中的所有段落:

XML文件

这样做我使用了这段代码:

    Pattern p = Pattern.compile("<paragraph>\\s*?(.*?)\\s*?(.*?)\\s*?(.*?)</paragraph>");
    Matcher m = p.matcher(ne);
    int occur = 1;

    while(m.find()) {

        System.out.print("Word = " + ne.substring(m.start(), m.end())+"\n");        }


    }

问题是它只检测第一段。请帮忙吗?

2 个答案:

答案 0 :(得分:2)

这是使用commons-lang的单行:

String[] paragraphs = StringUtils.substringsBetween(ne, "<paragraph>", "</paragraph>");

答案 1 :(得分:0)

梦想家,正如你所说的...对于一个简单的java项目&#34;:

//import java.util.regex.Matcher;
//import java.util.regex.Pattern;
StringBuilder text = new StringBuilder();
text.append("<html><something>");
text.append("<paragraph><Sentence>text 1 qwe</Sentence></paragraph>");
text.append("<paragraph><Sentence>text 2 qwe</Sentence></paragraph>");
text.append("<zzz>this text wont go</zzz>");
text.append("<paragraph><Sentence>text 3 qwe</Sentence></paragraph>");
text.append("</something></html");
System.out.println(text.toString());

Pattern p = Pattern.compile("<paragraph>(.*?)</paragraph>");
Matcher m = p.matcher(text.toString());

while (m.find()) {
    System.out.print("Word = " + m.group() + "\n");
}

输出:

<html><something><paragraph><Sentence>text 1 qwe</Sentence></paragraph>
<paragraph><Sentence>text 2 qwe</Sentence></paragraph><zzz>this text wont   
go</zzz><paragraph><Sentence>text 3 qwe</Sentence></paragraph></something>  
</html>
Word = <paragraph><Sentence>text 1 qwe</Sentence></paragraph>
Word = <paragraph><Sentence>text 2 qwe</Sentence></paragraph>
Word = <paragraph><Sentence>text 3 qwe</Sentence></paragraph>