Question

我有两个文件，File1.txt和File2.txt。两个文件都包含文本。我想知道这些文件中存在的常用单词总数。我使用此代码获得了每个文件中的单词总数。

public int get_Total_Number_Of_Words(File file) {
    try {
        Scanner sc = new Scanner(new FileInputStream(file));
        int count = 0;
        while (sc.hasNext()) {
            sc.next();
            count++;
        }
        return count;
    } catch (Exception e) {
        e.printStackTrace();
    }
    return 0;
}

请告诉我如何使用此代码计算两个文件之间的常用词。

Answer 1

使用Map实现。将单词作为键，将Integer作为每当找到键时增加的值。瞧！

    public static void main(String[] args) {
    String[] wordList = new String[]{"test1","test2","test1","test3","test1", "test2", "test4"};
    Map<String, Integer> countMap = new HashMap<String, Integer>();
    for (String word : wordList) {
        if (countMap.get(word)==null) {
            countMap.put(word, 1);
        }
        else {
            countMap.put(word,  countMap.get(word)+1);
        }
    }
    System.out.println(countMap);

}

结果是：

{test4=1, test2=2, test3=1, test1=3}

Answer 2

以下是使用Java 8和a project of mine的解决方案：

private static final Pattern WORDS = Pattern.compile("\\s+");

final LargeTextFactory factory = LargeTextFactory.defaultFactory();

final Path file1 = Paths.get("pathtofirstfile");
final Path file2 = Paths.get("pathtosecondfile");

final List<String> commonWords;

try (
    final LargeText t1 = factory.fromPath(file1);
    final LargeText t2 = factory.fromPath(file2);
) {
    final Set<String> seen = new HashSet<>();

    final Stream<String> all
        = Stream.concat(WORDS.splitAsStream(t1), WORDS.splitAsStream(t2));

    commonWords = all.filter(s -> { return !seen.add(s); })
        .collect(Collectors.toList());
}

// commonWords contains what you want

如果您选择使用Set的并发实现，也可以并行化。

Answer 3

我会创建2个列表并将一个文本文件中的单词添加到一个列表中，然后将另一个文本文件中的单词添加到另一个列表中，然后比较两个单词并计算相同的单词。

Answer 4

你必须进行某种比较。所以你可以使用嵌套循环来完成它。

String word1, word2;
int numCommon = 0;
try {
    Scanner sc = new Scanner(new FileInputStream(file));
    Scanner sc2 = new Scanner(new FileInputStream(file2));
    while (sc.hasNext()) {
        word1 = sc.next();
        while(sc2.hasNext()){
           word2 = sc2.next();
           if(word2.equals(word1))
              numCommon++;
        }
    }
    return numCommon;
} catch (Exception e) {
    e.printStackTrace();
}
return 0;

如何获取两个文本文件的匹配单词总数？

4 个答案: