Question

我打算做的基本上是：

逐字读取第一个文件并将其存储在Set（SetA）中。
读取第二个文件，并检查第一个Set（SetA）是否包含单词，如果确实包含该单词，则将其存储在第二个Set（SetB）中。现在，SetB在第一个和第二个文件中包含常用词。
类似地，我们将读取第三个文件，并检查SetB是否包含单词并将其存储在SetC中。

因此，如果您对我的方法有任何建议或任何问题。请提出建议。

Answer 1

欢迎堆栈溢出！

这种方法听起来不错。我可以建议使用Regex来节省您的时间编码。另一个需要考虑的问题是，确保不存储每个单词，而是仅在您的集合中存储唯一单词。

Answer 2

您可以使用retainAll

确定两个集合的交点

public class App {

    public static void main(String[] args) {
        App app = new App();
        app.run();
    }

    private void run() {
        List<String> file1 = Arrays.asList("aap", "noot", "aap", "wim", "vuur", "noot", "wim");
        List<String> file2 = Arrays.asList("aap", "noot", "mies", "aap", "zus", "jet", "aap", "wim", "vuur");
        List<String> file3 = Arrays.asList("noot", "mies", "wim", "vuur");

        System.out.println(getCommonWords(file1, file2, file3));
    }

    @SafeVarargs
    private final Set<String> getCommonWords(List<String>... files) {
        Set<String> result = new HashSet<>();
        // possible optimization sort files by ascending size
        Iterator<List<String>> it = Arrays.asList(files).iterator();
        if (it.hasNext()) {
            result.addAll(it.next());
        }
        while (it.hasNext()) {
            Set<String> words = new HashSet<>(it.next());
            result.retainAll(words);
        }
        return result;
    }
}

还请查看this answer，它显示了我上面给出的相同解决方案，以及使用Java 8 Streams进行编码的方法。

用Java在3个文件中查找通用词的有效方法是什么？

2 个答案: