我正在尝试检测此文件中的所有段落:
XML文件
这样做我使用了这段代码:
Pattern p = Pattern.compile("<paragraph>\\s*?(.*?)\\s*?(.*?)\\s*?(.*?)</paragraph>");
Matcher m = p.matcher(ne);
int occur = 1;
while(m.find()) {
System.out.print("Word = " + ne.substring(m.start(), m.end())+"\n"); }
}
问题是它只检测第一段。请帮忙吗?
答案 0 :(得分:2)
这是使用commons-lang的单行:
String[] paragraphs = StringUtils.substringsBetween(ne, "<paragraph>", "</paragraph>");
答案 1 :(得分:0)
梦想家,正如你所说的...对于一个简单的java项目&#34;:
//import java.util.regex.Matcher;
//import java.util.regex.Pattern;
StringBuilder text = new StringBuilder();
text.append("<html><something>");
text.append("<paragraph><Sentence>text 1 qwe</Sentence></paragraph>");
text.append("<paragraph><Sentence>text 2 qwe</Sentence></paragraph>");
text.append("<zzz>this text wont go</zzz>");
text.append("<paragraph><Sentence>text 3 qwe</Sentence></paragraph>");
text.append("</something></html");
System.out.println(text.toString());
Pattern p = Pattern.compile("<paragraph>(.*?)</paragraph>");
Matcher m = p.matcher(text.toString());
while (m.find()) {
System.out.print("Word = " + m.group() + "\n");
}
输出:
<html><something><paragraph><Sentence>text 1 qwe</Sentence></paragraph>
<paragraph><Sentence>text 2 qwe</Sentence></paragraph><zzz>this text wont
go</zzz><paragraph><Sentence>text 3 qwe</Sentence></paragraph></something>
</html>
Word = <paragraph><Sentence>text 1 qwe</Sentence></paragraph>
Word = <paragraph><Sentence>text 2 qwe</Sentence></paragraph>
Word = <paragraph><Sentence>text 3 qwe</Sentence></paragraph>