我有一个字符串,
string1 = "Sri Lanka National Chess Championship this year and represented Sri Lanka at represented Sri Lanka Universities at the World University Chess Championships."
我还有另一个名为'string2'的字符串,其中只有由空格分隔的'<NOUN> and </NOUN>
'标签所包围的字符串。
string2 = "<NOUN>Sri Lanka National Chess Championship</NOUN> <NOUN>Sri Lanka</NOUN> <NOUN>Sri Lanka</NOUN> <NOUN>World University Chess</NOUN>"
请注意,第二个字符串可以包含任何名词标记的单词(基于'string1',例如:如果string1有3个名词,则string2将具有由名词标签包围的相同3个名词)
我想在'string1'中添加标签,并按如下方式生成string1,
string1 = "<NOUN>Sri Lanka National Chess Championship</NOUN> this year and represented <NOUN>Sri Lanka</NOUN> at represented <NOUN>Sri Lanka</NOUN> Universities at the <NOUN>World University Chess</NOUN> Championships."
我使用以下代码执行此操作,
Pattern p = Pattern.compile("<NOUN>(.*?)</NOUN>");
Matcher m = p.matcher(string2);
while(m.find()) {
string1= string1.replaceAll(m.group(1),m.group(0));
}
但是它给了我以下输出,
<NOUN><NOUN><NOUN>Sri Lanka</NOUN></NOUN> National Chess Championship</NOUN> this year and represented <NOUN><NOUN>Sri Lanka</NOUN></NOUN> at represented <NOUN><NOUN>Sri Lanka</NOUN></NOUN> Universities at the <NOUN>World University Chess</NOUN> Championships.
有人可以告诉我如何正确地做到这一点吗? 或者请告诉我如何从给定的输出中获得所需的输出?
答案 0 :(得分:2)
而不是:
string1= string1.replaceAll(m.group(1),m.group(0));
使用:
string1= string1.replaceAll("(?<!<NOUN>)("+m.group(1)+")(?!</NOUN>)",m.group(0));
详细了解“向前看并在建筑背后”here
答案 1 :(得分:0)
您的示例的问题是Sri Lanka National Chess Championship
是名词和Sri Lanka
,此字符串的一部分也是名词。因此,您的匹配器会多次替换字符串。
您可以通过不替换已经替换的字符串片段来解决此问题。我为每场比赛打破了三个部分:之前,匹配-tr,之后。 保持断弦的顺序。 Vector是一种非常方便的数据结构。
import java.util.Vector;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Check {
static String print(Vector<String> parts) {
String str = parts.elementAt(0);
for(int i=1; i<parts.size(); i++) {
str += parts.elementAt(i);
//System.out.print(i + " : " + parts.elementAt(i) + "\n");
}
return str;
}
public static void main(String args[]) {
String string1;
String string2;
String expected;
string1 = "Sri Lanka National Chess Championship this year and represented Sri Lanka at represented Sri Lanka Universities at the World University Chess Championships.";
string2 = "<NOUN>Sri Lanka National Chess Championship</NOUN> <NOUN>Sri Lanka</NOUN> <NOUN>Sri Lanka</NOUN> <NOUN>World University Chess</NOUN>";
expected = "<NOUN>Sri Lanka National Chess Championship</NOUN> this year and represented <NOUN>Sri Lanka</NOUN> at represented <NOUN>Sri Lanka</NOUN> Universities at the <NOUN>World University Chess</NOUN> Championships.";
Pattern p = Pattern.compile("<NOUN>(.*?)</NOUN>");
Matcher m = p.matcher(string2);
Vector<String> parts = new Vector<String>();
parts.add(string1);
while(m.find()) {
for(int i=0; i<parts.size(); i++) {
//search for used part
if(parts.elementAt(i).indexOf("<NOUN>")!=-1) {
continue;
}
// search for pattern
String cur = parts.elementAt(i);
int disp = cur.indexOf(m.group(1));
if(disp==-1) {
continue;
} else {
parts.remove(i);
Vector<String> newParts = new Vector<String>();
if(disp!=0) {
newParts.add(cur.substring(0, disp));
}
newParts.add(m.group(0));
if((disp+m.group(1).length())!=cur.length()) {
newParts.add(cur.substring(disp+m.group(1).length()));
}
if(i!=0) {
parts.addAll(i, newParts);
} else {
parts.addAll(newParts);
}
//System.out.print(print(parts) + "\n");
}
}
}
string1 = print(parts);
if(!string1.equals(expected)) {
System.out.println("Unexpected output !!");
} else {
System.out.println("Correct !!");
}
}
};
为方便起见,您可以将print方法重命名为stringify。