HashMap解决方案

Question

这是我为执行比较2个文件的验证机制而编写的代码。我想知道有没有办法以更好的方式编写它，因为我的两个文件都可以包含数百万条记录，而且我相信在这些情况下会很慢。

我正在考虑添加一个哈希映射，每次在文件中出现一行时，我都会为键值添加+1。如果不是，则键的值保持为1。如果记录存在于文件2的另一个映射中，那么我将其从第一个映射中删除。如果它没有，那么我将它添加到地图中。这将交替文件直到结束。

我不进行逐行比较，因为两个文件中的行顺序可能不同。

public static void main(String[] args) throws Exception {
    BufferedReader br1 = null;
    BufferedReader br2 = null;
    BufferedWriter br3 = null;
    String sCurrentLine;
    int linelength;
    List<String> list1 = new ArrayList<String>();
    List<String> list2 = new ArrayList<String>();
    List<String> unexpectedrecords = new ArrayList<String>();

    br1 = new BufferedReader(new FileReader("expected.txt"));

    br2 = new BufferedReader(new FileReader("actual.txt"));

    while ((sCurrentLine = br1.readLine()) != null) {
        list1.add(sCurrentLine);
    }
    while ((sCurrentLine = br2.readLine()) != null) {
        list2.add(sCurrentLine);
    }
    List<String> expectedrecords = new ArrayList<String>(list1);

    List<String> actualrecords = new ArrayList<String>(list2);

    if (expectedrecords.size() > actualrecords.size()) {
        linelength = expectedrecords.size();
    } else {
        linelength = actualrecords.size();
    }

    for (int i = 0; i < linelength; i++) {
        if (actualrecords.contains(expectedrecords.get(i))) {
            actualrecords.remove(expectedrecords.get(i));
        } else {
            unexpectedrecords.add(actualrecords.get(i));
        }
    }

    br3 = new BufferedWriter(new FileWriter(new File("c.txt")));
    br3.write("Records which are not present in actual");
    for (int x = 0; x < unexpectedrecords.size(); x++) {
        br3.write(unexpectedrecords.get(x));
        br3.newLine();
    }
    br3.write("Records which are in actual but no present in expected");
    for (int i = 0; i < actualrecords.size(); i++) {
        br3.write(actualrecords.get(i));
        br3.newLine();
    }
    br3.flush();
    br3.close();
}

Answer 1

在Unix / Linux计算机上，您只需调用diff，它已针对速度和内存使用进行了优化。

电话看起来像是

String listFileDiffs = executeDiff(filenameWithPath1, filenameWithPath2);

该方法由以下人员实施：

private String executeDiff(String filenameWithPath1, String filenameWithPath2) {
    StringBuffer output = new StringBuffer();
    Process p0;
    Process p1;
    Process p2;
    try {
        p0 = Runtime.getRuntime().exec("sort " + filenameWithPath1 + " > /tmp/sort1file");
        p0.waitFor();
        p1 = Runtime.getRuntime().exec("sort " + filenameWithPath2 + " > /tmp/sort2file");
        p1.waitFor();
        p2 = Runtime.getRuntime().exec("diff " + "/tmp/sort1file" + " " + "/tmp/sort2file");
        p2.waitFor();
        BufferedReader reader =
                new BufferedReader(new InputStreamReader(p2.getInputStream()));
        String line = "";
        while ((line = reader.readLine())!= null) {
            output.append(line + "\n");
        }
    } catch (Exception e) {
        LOG.error("Error: executeCommand ", e);
    }
    return output.toString();
}

您可以向diff添加标记，以便提供有关找到的所有文件差异的更多信息。

该解决方案已经过调整，以考虑每个文件中行的随机顺序。正在为两个文件中的每一个调用Unix sort。随后正在运行diff。

Unix命令已经成熟了数十年，并且工作效率很高。

Answer 2

在Java 8中，您可以使用Collection.removeIf(Predicate<T>)

list1.removeIf(line -> list2.contains(line));
list2.removeIf(line -> list1.contains(line));

list1将包含list2中不存在的所有内容，list2将包含所有内容，不在list1中。

Answer 3

HashMap解决方案

我想到了它，HashMap解决方案是即时的。我继续在这里编写了一个例子。

它运行在0ms，而arrayLists在16ms内运行相同的数据集

public static void main(String[] args) throws Exception {
    BufferedReader br1 = null;
    BufferedReader br2 = null;
    BufferedWriter bw3 = null;
    String sCurrentLine;
    int linelength;

    HashMap<String, Integer> expectedrecords = new HashMap<String, Integer>();
    HashMap<String, Integer> actualrecords = new HashMap<String, Integer>();

    br1 = new BufferedReader(new FileReader("expected.txt"));
    br2 = new BufferedReader(new FileReader("actual.txt"));

    while ((sCurrentLine = br1.readLine()) != null) {
        if (expectedrecords.containsKey(sCurrentLine)) {
            expectedrecords.put(sCurrentLine, expectedrecords.get(sCurrentLine) + 1);
        } else {
            expectedrecords.put(sCurrentLine, 1);
        }
    }
    while ((sCurrentLine = br2.readLine()) != null) {
        if (expectedrecords.containsKey(sCurrentLine)) {
            int expectedCount = expectedrecords.get(sCurrentLine) - 1;
            if (expectedCount == 0) {
                expectedrecords.remove(sCurrentLine);
            } else {
                expectedrecords.put(sCurrentLine, expectedCount);
            }
        } else {
            if (actualrecords.containsKey(sCurrentLine)) {
                actualrecords.put(sCurrentLine, actualrecords.get(sCurrentLine) + 1);
            } else {
                actualrecords.put(sCurrentLine, 1);
            }
        }
    }

    // expected is left with all records not present in actual
    // actual is left with all records not present in expected
    bw3 = new BufferedWriter(new FileWriter(new File("c.txt")));
    bw3.write("Records which are not present in actual\n");
    for (String key : expectedrecords.keySet()) {
        for (int i = 0; i < expectedrecords.get(key); i++) {
            bw3.write(key);
            bw3.newLine();
        }
    }
    bw3.write("Records which are in actual but not present in expected\n");
    for (String key : actualrecords.keySet()) {
        for (int i = 0; i < actualrecords.get(key); i++) {
            bw3.write(key);
            bw3.newLine();
        }
    }
    bw3.flush();
    bw3.close();
}

例如：

<强> expected.txt

one
two
four
five
seven
eight

<强> actual.txt

one
two
three
five
six

<强> c.txt

Records which are not present in actual
four
seven
eight
Records which are in actual but not present in expected
three
six

前2：

<强> expected.txt

one
two
four
five
seven
eight
duplicate
duplicate
duplicate

<强> actual.txt

one
duplicate
two
three
five
six

<强> c.txt

Records which are not present in actual
four
seven
eight
duplicate
duplicate
Records which are in actual but not present in expected
three
six

比较java中的2个文本文件，并将两者中的差异分别写入另一个文件

3 个答案:

HashMap解决方案