Question

我有两个文件。一个文件包含一组URL，这些URL需要与第二个文件包含一组URL进行匹配。目前，我已经使用了foreach循环来进行匹配。由于它具有95,000个URL，因此性能下降很慢。

由于速度慢，我需要一种提高应用程序性能的方法。我很高兴知道有什么方法可以避免这种低性能？

谢谢。

Answer 1

您可以尝试使用基数树来存储第二个文件的数据并进行搜索。 https://en.wikipedia.org/wiki/Trie

Answer 2

这里使用的合适数据结构是哈希集，因为它具有恒定的查找时间。您可以从第一个文件中解析一组URL，然后将它们插入哈希集中。然后，解析第二个文件，并检查每个URL是否在第一个文件中。

Set<String> urls = new HashSet<>();

// parse file file and add URLs to hashset
try (BufferedReader br = Files.newBufferedReader(Paths.get("firstURLs.txt"))) {
    String line;
    while ((line = br.readLine()) != null) {
        urls.add(line);
    }
}
catch (IOException e) {
    System.err.format("IOException: %s%n", e);
}

// parse second file
try (BufferedReader br = Files.newBufferedReader(Paths.get("secondURLs.txt"))) {
    String line;
    while ((line = br.readLine()) != null) {
        if (urls.contains(line)) {
            System.out.println("MATCH: " + line);
    }
}
catch (IOException e) {
    System.err.format("IOException: %s%n", e);
}

此方法的优点在于，它应与两个文件的大小线性地表现。

将文件中的记录与Java中的另一个文件进行匹配

2 个答案: