使用java

时间:2017-03-08 14:21:25

标签: java xml replace

我有一个xml-base .tbx文件,其中包含如下代码:

<descripGrp>
      <descrip type="subjectField">406001</descrip>
    </descripGrp>
    <langSet xml:lang="en">
      <tig>
        <term>competence of the Member States</term>
        <termNote type="termType">fullForm</termNote>
        <descrip type="reliabilityCode">3</descrip>
      </tig>
    </langSet>
    <langSet xml:lang="pl">
      <tig>
        <term>kompetencje państw członkowskich</term>
        <termNote type="termType">fullForm</termNote>
        <descrip type="reliabilityCode">3</descrip>
      </tig>
    </langSet>
  </termEntry>
  <termEntry id="IATE-290">
    <descripGrp>
      <descrip type="subjectField">406001</descrip>
    </descripGrp>

我想在整个(差不多50 MiB)文件中搜索和替换来自字段&#34; subjectField&#34;的代码。并用适当的文字替换,例如。 406001用于政治意识形态,406002用于政治制度。 我有一个包含代码和相应名称的表: 406001政治意识形态 406002政治机构 406003政治哲学

有500个这样的代码,所以手工做这样的代码就像永远。 我不是程序员(我学习),但我知道一点java,所以我做了一些小应用程序,我认为,这对我很有帮助,但结果令人沮丧(幸运的是我不是气馁:))

这就是我写的,结果是它工作得非常慢,根本不能取代这些代码。它在15分钟内处理了文件的1/5(!)。此外,输出文件中没有换行符,因此整个xml代码都在一行中。

关于我应该走哪条路的提示?

    File log= new File("D:\\IATE\\export_EN_PL_2017-03-07_All_Langs.tbx"); // TBX file to be processed
    File newe = new File("D:\\IATE\\now.txt"); // output file
    String search = "D:\\IATE\\org.txt"; // file containing codes "40600" etc
    String replace = "D:\\IATE\\rplc.txt"; // file containing names 

    try {
        FileReader fr = new FileReader(log);
        String s;
        String s1;
        String s2;
        String totalStr = "";
        String tot1 = "";
        String tot2 = "";
        FileReader fr1 = new FileReader(search);
        FileReader fr2 = new FileReader(replace);
        try (BufferedReader br = new BufferedReader(fr)) {
            try (BufferedReader br1 = new BufferedReader(fr1)) {
                try (BufferedReader br2 = new BufferedReader(fr2)) {
                    while ((s = br.readLine()) != null) {
                        totalStr += s;
                            while((s1 = br1.readLine()) != null){
                                tot1 += s1;

                                while ((s2 = br2.readLine()) != null){
                                    tot2 += s2;
                                }
                            }
                        totalStr = totalStr.replaceAll(tot1, tot2);

                    FileWriter fw = new FileWriter(newe);

                    fw.write(totalStr);
                    fw.write("\n");
                    fw.close();
                    }


                } catch (Exception e) {
                    e.printStackTrace();
                }
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    } catch (Exception e) {
        e.printStackTrace();
    }

}

1 个答案:

答案 0 :(得分:0)

它需要花费大量的冗余工作来遍历2个文件才能获得匹配的值。在替换.tbx文件中的值之前,应设置要读取的属性文件。这是一个可以做到这一点的功能:

public static Properties getProps(String pathToNames, String pathToNumbers){

    Properties prop = new Properties();

    try{
        File names = new File(pathToNames);
        BufferedReader theNames = new BufferedReader( new InputStreamReader (new FileInputStream(names)));
        File numbers = new File(pathToNumbers);
        BufferedReader theNumbers = new BufferedReader( new InputStreamReader (new FileInputStream(numbers)));

        String name;
        String number;
        while(((name = theNames.readLine())!= null)&&((number = theNumbers.readLine())!= null)){
            prop.put(number, name);
        }
        theNames.close();
        theNumbers.close();

    }catch(Exception e){
        e.printStackTrace();
    }
    return prop;
}

假设您使用的是Java 8,您可以检查该功能是否正常:

thePropertiesFile.forEach((Object key, Object value) ->{
    System.out.println(key+ "  " +value);
});

现在你可以编写一个可以正常转换的函数。使用PrintStream来实现所需的输出功能。

static String workingDir = System.getProperty("user.dir");
public static void main(String[] args){

    Properties p = getProps(workingDir+"path/to/names.txt",workingDir+"path/to/numbers.txt");
    File output = new File(workingDir+"path/to/output.txt");

    try {
        PrintStream ps = new PrintStream(output);
        BufferedReader tbx = new BufferedReader(new InputStreamReader (new FileInputStream(new File(workingDir+"path/to/the.tbx"))));
        String currentLine;
        String theNum;
        String theName;
        int c; //temp index
        int start;
        int end;
        while((currentLine = tbx.readLine()) != null){
            if(currentLine.contains("subjectField")){
                c = currentLine.indexOf("subjectField");
                start = currentLine.indexOf(">", c)+1;
                end = currentLine.indexOf("<", c);
                theNum = currentLine.substring(start, end);
                theName = p.getProperty(theNum);
                currentLine = currentLine.substring(0,start)+theName+currentLine.substring(end);
            }
            ps.println(currentLine);            
        }
        ps.close();
        tbx.close();
    } catch (IOException e) {
        e.printStackTrace();
    }

}

对于不存在的数字,这将用空字符串替换它们。您可以根据具体用途进行更新。

如果theNum包含多个值,则拆分为数组:

theName = "";
if(theNum.contains(","){
  int[] theNums = theNum.split(",");     
  for (int num : theNums) {
      theName += p.getProperty(num);
      theName += ",";
  }
  theName = theName.replaceAll(",$", ""); //get rid of trailing comma
}
else
   theName = p.getProperty(theNum);