这是我为执行比较2个文件的验证机制而编写的代码。 我想知道有没有办法以更好的方式编写它,因为我的两个文件都可以包含数百万条记录,而且我相信在这些情况下会很慢。
我正在考虑添加一个哈希映射,每次在文件中出现一行时,我都会为键值添加+1。如果不是,则键的值保持为1。 如果记录存在于文件2的另一个映射中,那么我将其从第一个映射中删除。 如果它没有,那么我将它添加到地图中。 这将交替文件直到结束。
我不进行逐行比较,因为两个文件中的行顺序可能不同。
public static void main(String[] args) throws Exception {
BufferedReader br1 = null;
BufferedReader br2 = null;
BufferedWriter br3 = null;
String sCurrentLine;
int linelength;
List<String> list1 = new ArrayList<String>();
List<String> list2 = new ArrayList<String>();
List<String> unexpectedrecords = new ArrayList<String>();
br1 = new BufferedReader(new FileReader("expected.txt"));
br2 = new BufferedReader(new FileReader("actual.txt"));
while ((sCurrentLine = br1.readLine()) != null) {
list1.add(sCurrentLine);
}
while ((sCurrentLine = br2.readLine()) != null) {
list2.add(sCurrentLine);
}
List<String> expectedrecords = new ArrayList<String>(list1);
List<String> actualrecords = new ArrayList<String>(list2);
if (expectedrecords.size() > actualrecords.size()) {
linelength = expectedrecords.size();
} else {
linelength = actualrecords.size();
}
for (int i = 0; i < linelength; i++) {
if (actualrecords.contains(expectedrecords.get(i))) {
actualrecords.remove(expectedrecords.get(i));
} else {
unexpectedrecords.add(actualrecords.get(i));
}
}
br3 = new BufferedWriter(new FileWriter(new File("c.txt")));
br3.write("Records which are not present in actual");
for (int x = 0; x < unexpectedrecords.size(); x++) {
br3.write(unexpectedrecords.get(x));
br3.newLine();
}
br3.write("Records which are in actual but no present in expected");
for (int i = 0; i < actualrecords.size(); i++) {
br3.write(actualrecords.get(i));
br3.newLine();
}
br3.flush();
br3.close();
}
答案 0 :(得分:0)
在Unix / Linux计算机上,您只需调用diff
,它已针对速度和内存使用进行了优化。
电话看起来像是
String listFileDiffs = executeDiff(filenameWithPath1, filenameWithPath2);
该方法由以下人员实施:
private String executeDiff(String filenameWithPath1, String filenameWithPath2) {
StringBuffer output = new StringBuffer();
Process p0;
Process p1;
Process p2;
try {
p0 = Runtime.getRuntime().exec("sort " + filenameWithPath1 + " > /tmp/sort1file");
p0.waitFor();
p1 = Runtime.getRuntime().exec("sort " + filenameWithPath2 + " > /tmp/sort2file");
p1.waitFor();
p2 = Runtime.getRuntime().exec("diff " + "/tmp/sort1file" + " " + "/tmp/sort2file");
p2.waitFor();
BufferedReader reader =
new BufferedReader(new InputStreamReader(p2.getInputStream()));
String line = "";
while ((line = reader.readLine())!= null) {
output.append(line + "\n");
}
} catch (Exception e) {
LOG.error("Error: executeCommand ", e);
}
return output.toString();
}
您可以向diff
添加标记,以便提供有关找到的所有文件差异的更多信息。
该解决方案已经过调整,以考虑每个文件中行的随机顺序。正在为两个文件中的每一个调用Unix sort
。随后正在运行diff
。
Unix命令已经成熟了数十年,并且工作效率很高。
答案 1 :(得分:0)
在Java 8中,您可以使用Collection.removeIf(Predicate<T>)
list1.removeIf(line -> list2.contains(line));
list2.removeIf(line -> list1.contains(line));
list1将包含list2中不存在的所有内容,list2将包含所有内容,不在list1中。
答案 2 :(得分:0)
我想到了它,HashMap解决方案是即时的。我继续在这里编写了一个例子。
它运行在0ms,而arrayLists在16ms内运行相同的数据集
public static void main(String[] args) throws Exception {
BufferedReader br1 = null;
BufferedReader br2 = null;
BufferedWriter bw3 = null;
String sCurrentLine;
int linelength;
HashMap<String, Integer> expectedrecords = new HashMap<String, Integer>();
HashMap<String, Integer> actualrecords = new HashMap<String, Integer>();
br1 = new BufferedReader(new FileReader("expected.txt"));
br2 = new BufferedReader(new FileReader("actual.txt"));
while ((sCurrentLine = br1.readLine()) != null) {
if (expectedrecords.containsKey(sCurrentLine)) {
expectedrecords.put(sCurrentLine, expectedrecords.get(sCurrentLine) + 1);
} else {
expectedrecords.put(sCurrentLine, 1);
}
}
while ((sCurrentLine = br2.readLine()) != null) {
if (expectedrecords.containsKey(sCurrentLine)) {
int expectedCount = expectedrecords.get(sCurrentLine) - 1;
if (expectedCount == 0) {
expectedrecords.remove(sCurrentLine);
} else {
expectedrecords.put(sCurrentLine, expectedCount);
}
} else {
if (actualrecords.containsKey(sCurrentLine)) {
actualrecords.put(sCurrentLine, actualrecords.get(sCurrentLine) + 1);
} else {
actualrecords.put(sCurrentLine, 1);
}
}
}
// expected is left with all records not present in actual
// actual is left with all records not present in expected
bw3 = new BufferedWriter(new FileWriter(new File("c.txt")));
bw3.write("Records which are not present in actual\n");
for (String key : expectedrecords.keySet()) {
for (int i = 0; i < expectedrecords.get(key); i++) {
bw3.write(key);
bw3.newLine();
}
}
bw3.write("Records which are in actual but not present in expected\n");
for (String key : actualrecords.keySet()) {
for (int i = 0; i < actualrecords.get(key); i++) {
bw3.write(key);
bw3.newLine();
}
}
bw3.flush();
bw3.close();
}
例如:
<强> expected.txt 强>
one
two
four
five
seven
eight
<强> actual.txt 强>
one
two
three
five
six
<强> c.txt 强>
Records which are not present in actual
four
seven
eight
Records which are in actual but not present in expected
three
six
前2:
<强> expected.txt 强>
one
two
four
five
seven
eight
duplicate
duplicate
duplicate
<强> actual.txt 强>
one
duplicate
two
three
five
six
<强> c.txt 强>
Records which are not present in actual
four
seven
eight
duplicate
duplicate
Records which are in actual but not present in expected
three
six