如何将从字符串中取出的特殊字符插入另一个字符串?

时间:2012-08-18 06:40:01

标签: java regex string pattern-matching

我有一个字符串,

    string1 = "Sri Lanka National Chess Championship this year and represented Sri Lanka at represented Sri Lanka Universities at the World University Chess Championships."

我还有另一个名为'string2'的字符串,其中只有由空格分隔的'<NOUN> and </NOUN>'标签所包围的字符串。

string2 = "<NOUN>Sri Lanka National Chess Championship</NOUN> <NOUN>Sri Lanka</NOUN> <NOUN>Sri Lanka</NOUN> <NOUN>World University Chess</NOUN>"

请注意,第二个字符串可以包含任何名词标记的单词(基于'string1',例如:如果string1有3个名词,则string2将具有由名词标签包围的相同3个名词)
我想在'string1'中添加标签,并按如下方式生成string1,

string1 = "<NOUN>Sri Lanka National Chess Championship</NOUN> this year and represented <NOUN>Sri Lanka</NOUN> at represented <NOUN>Sri Lanka</NOUN> Universities at the <NOUN>World University Chess</NOUN> Championships."

我使用以下代码执行此操作,

Pattern p = Pattern.compile("<NOUN>(.*?)</NOUN>");
    Matcher m = p.matcher(string2);
    while(m.find()) {
        string1= string1.replaceAll(m.group(1),m.group(0));
    } 

但是它给了我以下输出,

<NOUN><NOUN><NOUN>Sri Lanka</NOUN></NOUN> National Chess Championship</NOUN> this year and represented <NOUN><NOUN>Sri Lanka</NOUN></NOUN> at represented <NOUN><NOUN>Sri Lanka</NOUN></NOUN> Universities at the <NOUN>World University Chess</NOUN> Championships.

有人可以告诉我如何正确地做到这一点吗? 或者请告诉我如何从给定的输出中获得所需的输出?

2 个答案:

答案 0 :(得分:2)

而不是:

string1= string1.replaceAll(m.group(1),m.group(0));

使用:

string1= string1.replaceAll("(?<!<NOUN>)("+m.group(1)+")(?!</NOUN>)",m.group(0));

详细了解“向前看并在建筑背后”here

答案 1 :(得分:0)

您的示例的问题是Sri Lanka National Chess Championship是名词和Sri Lanka,此字符串的一部分也是名词。因此,您的匹配器会多次替换字符串。

您可以通过不替换已经替换的字符串片段来解决此问题。我为每场比赛打破了三个部分:之前,匹配-tr,之后。 保持断弦的顺序。 Vector是一种非常方便的数据结构。

import java.util.Vector;
import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class Check {

static String print(Vector<String> parts) {
    String str = parts.elementAt(0);

    for(int i=1; i<parts.size(); i++) {
        str += parts.elementAt(i); 
        //System.out.print(i + " : " + parts.elementAt(i) + "\n");
    }

    return str;
}

public static void main(String args[]) {
    String string1;
    String string2;
    String expected;

    string1 = "Sri Lanka National Chess Championship this year and represented Sri Lanka at represented Sri Lanka Universities at the World University Chess Championships.";
    string2 = "<NOUN>Sri Lanka National Chess Championship</NOUN> <NOUN>Sri Lanka</NOUN> <NOUN>Sri Lanka</NOUN> <NOUN>World University Chess</NOUN>";
    expected = "<NOUN>Sri Lanka National Chess Championship</NOUN> this year and represented <NOUN>Sri Lanka</NOUN> at represented <NOUN>Sri Lanka</NOUN> Universities at the <NOUN>World University Chess</NOUN> Championships.";


    Pattern p = Pattern.compile("<NOUN>(.*?)</NOUN>");
    Matcher m = p.matcher(string2);
    Vector<String> parts = new Vector<String>();
    parts.add(string1);

    while(m.find()) {
        for(int i=0; i<parts.size(); i++) {

            //search for used part
            if(parts.elementAt(i).indexOf("<NOUN>")!=-1) {
                continue;
            }

            // search for pattern
            String cur = parts.elementAt(i);
            int disp = cur.indexOf(m.group(1));
            if(disp==-1) {
                continue;
            } else {
                parts.remove(i);
                Vector<String> newParts = new Vector<String>();

                if(disp!=0) {
                    newParts.add(cur.substring(0, disp));
                }

                newParts.add(m.group(0));

                if((disp+m.group(1).length())!=cur.length()) {
                    newParts.add(cur.substring(disp+m.group(1).length()));
                }

                if(i!=0) {
                    parts.addAll(i, newParts);
                } else {
                    parts.addAll(newParts);
                }

                //System.out.print(print(parts) + "\n");
            }           
        }
    }

    string1 = print(parts);
    if(!string1.equals(expected)) {
        System.out.println("Unexpected output !!");
    } else {
        System.out.println("Correct !!");
    }
}

};

为方便起见,您可以将print方法重命名为stringify。