Question

我正在加载各种具有不同长度的文本文件，并将它们添加到称为“ collection”的HashMap中。

List<String> textFileList = Arrays.asList("ArsenalNoStopWords.txt", "ChelseaNoStopWords.txt", "LiverpoolNoStopWords.txt",
            "ManchesterUnitedNoStopWords.txt", "ManchesterCityNoStopWords.txt", "TottenhamNoStopWords.txt");

for (String text : textFileList) {
        scanFile(text);
    }

public static void scanFile(String textFileName) {
    try {

        Scanner textFile = new Scanner(new File(textFileName));

        while (textFile.hasNext()) {
             collection.put(textFile.next().trim(), 0);
        }

        textFile.close();

    } catch (FileNotFoundException e) {
         e.printStackTrace();
    }
}

此后，我将加载其中一个文档，并使用HashMap（集合）计算其单词出现频率。

ArrayList<Integer> document = new ArrayList<Integer>();

document = processDocument("TottenhamNoStopWords.txt");

private static ArrayList<Integer> processDocument(String inFileName) throws IOException {

    for (Map.Entry<String, Integer> entry : collection.entrySet()) {
        entry.setValue(0);
    }

    Scanner textFile = new Scanner(new File(inFileName));
    ArrayList<String> file = new ArrayList<String>();

    while(textFile.hasNext()) {
        file.add(textFile.next().trim().toLowerCase());
    }

    for(String word : file) {
        Integer dict = collection.get(word);
        if (!collection.containsKey(word)) {
            collection.put(word, 1); 
        } else {
            collection.put(word, dict + 1);
        }
    }

    textFile.close();

    ArrayList<Integer> values = new ArrayList<>(collection.values());
    return values;  
}

在此之后，我将变量值从processDocument（）输出到文本文件-我有六个，所有的名字都不同。从理论上讲，每个团队的集合的每个版本都应具有相同的长度，因为集合的键从不更改，并且始终来自textFileList列表-唯一更改的变量是要处理的文档。但是为什么我的向量（ArrayLists）的长度很长，而它们的大小应该相同但频率值却不同？

Answer 1

在第一步中，将textFile.next().trim()添加到第二部分，将file.add(textFile.next().trim().toLowerCase())添加到您的集合中，您在集合中重复使用小写和非小写的值。

各种HashMap大小（Java）

1 个答案: