我有一个xml-base .tbx文件,其中包含如下代码:
<descripGrp>
<descrip type="subjectField">406001</descrip>
</descripGrp>
<langSet xml:lang="en">
<tig>
<term>competence of the Member States</term>
<termNote type="termType">fullForm</termNote>
<descrip type="reliabilityCode">3</descrip>
</tig>
</langSet>
<langSet xml:lang="pl">
<tig>
<term>kompetencje państw członkowskich</term>
<termNote type="termType">fullForm</termNote>
<descrip type="reliabilityCode">3</descrip>
</tig>
</langSet>
</termEntry>
<termEntry id="IATE-290">
<descripGrp>
<descrip type="subjectField">406001</descrip>
</descripGrp>
我想在整个(差不多50 MiB)文件中搜索和替换来自字段&#34; subjectField&#34;的代码。并用适当的文字替换,例如。 406001用于政治意识形态,406002用于政治制度。 我有一个包含代码和相应名称的表: 406001政治意识形态 406002政治机构 406003政治哲学
有500个这样的代码,所以手工做这样的代码就像永远。 我不是程序员(我学习),但我知道一点java,所以我做了一些小应用程序,我认为,这对我很有帮助,但结果令人沮丧(幸运的是我不是气馁:))
这就是我写的,结果是它工作得非常慢,根本不能取代这些代码。它在15分钟内处理了文件的1/5(!)。此外,输出文件中没有换行符,因此整个xml代码都在一行中。
关于我应该走哪条路的提示?
File log= new File("D:\\IATE\\export_EN_PL_2017-03-07_All_Langs.tbx"); // TBX file to be processed
File newe = new File("D:\\IATE\\now.txt"); // output file
String search = "D:\\IATE\\org.txt"; // file containing codes "40600" etc
String replace = "D:\\IATE\\rplc.txt"; // file containing names
try {
FileReader fr = new FileReader(log);
String s;
String s1;
String s2;
String totalStr = "";
String tot1 = "";
String tot2 = "";
FileReader fr1 = new FileReader(search);
FileReader fr2 = new FileReader(replace);
try (BufferedReader br = new BufferedReader(fr)) {
try (BufferedReader br1 = new BufferedReader(fr1)) {
try (BufferedReader br2 = new BufferedReader(fr2)) {
while ((s = br.readLine()) != null) {
totalStr += s;
while((s1 = br1.readLine()) != null){
tot1 += s1;
while ((s2 = br2.readLine()) != null){
tot2 += s2;
}
}
totalStr = totalStr.replaceAll(tot1, tot2);
FileWriter fw = new FileWriter(newe);
fw.write(totalStr);
fw.write("\n");
fw.close();
}
} catch (Exception e) {
e.printStackTrace();
}
} catch (Exception e) {
e.printStackTrace();
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
答案 0 :(得分:0)
它需要花费大量的冗余工作来遍历2个文件才能获得匹配的值。在替换.tbx文件中的值之前,应设置要读取的属性文件。这是一个可以做到这一点的功能:
public static Properties getProps(String pathToNames, String pathToNumbers){
Properties prop = new Properties();
try{
File names = new File(pathToNames);
BufferedReader theNames = new BufferedReader( new InputStreamReader (new FileInputStream(names)));
File numbers = new File(pathToNumbers);
BufferedReader theNumbers = new BufferedReader( new InputStreamReader (new FileInputStream(numbers)));
String name;
String number;
while(((name = theNames.readLine())!= null)&&((number = theNumbers.readLine())!= null)){
prop.put(number, name);
}
theNames.close();
theNumbers.close();
}catch(Exception e){
e.printStackTrace();
}
return prop;
}
假设您使用的是Java 8,您可以检查该功能是否正常:
thePropertiesFile.forEach((Object key, Object value) ->{
System.out.println(key+ " " +value);
});
现在你可以编写一个可以正常转换的函数。使用PrintStream
来实现所需的输出功能。
static String workingDir = System.getProperty("user.dir");
public static void main(String[] args){
Properties p = getProps(workingDir+"path/to/names.txt",workingDir+"path/to/numbers.txt");
File output = new File(workingDir+"path/to/output.txt");
try {
PrintStream ps = new PrintStream(output);
BufferedReader tbx = new BufferedReader(new InputStreamReader (new FileInputStream(new File(workingDir+"path/to/the.tbx"))));
String currentLine;
String theNum;
String theName;
int c; //temp index
int start;
int end;
while((currentLine = tbx.readLine()) != null){
if(currentLine.contains("subjectField")){
c = currentLine.indexOf("subjectField");
start = currentLine.indexOf(">", c)+1;
end = currentLine.indexOf("<", c);
theNum = currentLine.substring(start, end);
theName = p.getProperty(theNum);
currentLine = currentLine.substring(0,start)+theName+currentLine.substring(end);
}
ps.println(currentLine);
}
ps.close();
tbx.close();
} catch (IOException e) {
e.printStackTrace();
}
}
对于不存在的数字,这将用空字符串替换它们。您可以根据具体用途进行更新。
如果theNum包含多个值,则拆分为数组:
theName = "";
if(theNum.contains(","){
int[] theNums = theNum.split(",");
for (int num : theNums) {
theName += p.getProperty(num);
theName += ",";
}
theName = theName.replaceAll(",$", ""); //get rid of trailing comma
}
else
theName = p.getProperty(theNum);