A 24-year-old youth died on the spot, after his motorcycle
rammed a divider near Golf market on <LOCATION>BelAir</LOCATION> road
Thursday night. The deceased has been identified as
John(24) hailing from <LOCATION>UK</LOCATION>.
He was originally from <LOCATION>Usa</LOCATION>.
句子是2个不同的段落。我希望输出看起来像:
Para 1:BelAir
UK
Para 2:Usa
我已将标记的正则表达式标识为:
<(?<tag>\w*)>(?<text>.*)</\k<tag>>
和段落:
(\n|^).*?(?=\n|$)
有没有办法将这些结合起来?或者我应该使用拆分?
答案 0 :(得分:0)
检查String是否以'\ n'
开头while(){//read line
if(string.startsWith("\n")==false){
// your regex expration for tags
// store it in a list
}
else{
// add a null in a List
}
}
所以你的列表看起来像
BelAir
US
Null
USA
所以在每个null之后都有一个新的Para
答案 1 :(得分:0)
试试这个
String str = "A 24-year-old youth died on the spot, after his motorcycle " +
"rammed a divider near Golf market on <LOCATION>BelAir</LOCATION> road" +
" Thursday night. The deceased has been identified as John(24) hailing from <LOCATION>UK</LOCATION>." +
"\n He was originally from <LOCATION>Usa</LOCATION>.";
String [] paras=str.split("\n"); //Divide the string into two paragraphs
Pattern pattern = Pattern.compile("<LOCATION>(.*?)</LOCATION>");
for(int i=0;i<paras.length;i++)
{
System.out.print("Para "+(i+1)+": ");
Matcher matcher = pattern.matcher(paras[i]);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
}
输出为
Para 1: BelAir
UK
Para 2: Usa