Question

如何使用数组计算文本文件中重复的单词？

我的程序能够打印出文件中的总单词，但是如何让我的程序打印出不同单词的数量，并打印出重复单词的数量列表，如下所示：

蛋糕：4 a：320 件数：2 24

（带大写字母和小写字母的单词被认为是同一个单词）

void FileReader() { 


    System.out.println("Oppgave A");
    int totalWords = 0; 
    int uniqueWords = 0; 
    String [] word = new String[35000];
    String [] wordC = new String [3500];
    try {
        File fr = new File("Alice.txt");
        Scanner sc = new Scanner (fr);

        while(sc.hasNext()){
        String words = sc.next();
        String[] space = words.split(" ");
        String[] comma = words.split(",");
            totalWords++;


            }
        System.out.println("Antall ord som er lest er: " + totalWords);         
    } catch (Exception e) {

        System.out.println("File not found");

    }

Answer 1

这对数组来说非常缺乏，因为在每个单词之后你必须遍历数组以查看单词是否已经出现。而是使用HashMap，其中key是单词，value是出现的数量。查看HashMap是否包含键而不是查看数组是否包含元素更容易，更快捷。

编辑：

HashMap<String, Integer>

Answer 2

尝试使用集合，并使用迭代检查返回值。

Set<String> set = new HashSet(Arrays.asList(word));
int unique = 0;
for (String temp : word) {
    if (set.add(temp)) {
        unique++;
    }
}

//or...
Set<String> set = new HashSet(Arrays.asList(word));
int unique = set.size();

这当然是在导入了所有值之后。

编辑：看到您无法使用地图（并假设其他数据结构），您可能需要采取一些粗略的方法来检查每个值。

//get a new word from the text file
boolean isUnique = true;
//for every word in your array; input == your new word
    if (word.equalsIgnoreCase(input)) {
        unique = false
    }
//end loop
if (isUnique) {
    unique++; // Assuming unique is the count of unique words
}

Answer 3

每次添加地图中已有的单词时，您都可以使用地图增加值（计数）

Answer 4

每次添加单词时，都需要检查单词中是否已存在该单词。要进行比较，您需要使用：

 word1.equalsIgnoreCase(word2);

Answer 5

试试这个：

 try {
            List<String> list = new ArrayList<String>();
            int totalWords = 0;
            int uniqueWords = 0;
            File fr = new File("Alice.txt");
            Scanner sc = new Scanner(fr);
            while (sc.hasNext()) {
                String words = sc.next();
                String[] space = words.split(" ");
                for (int i = 0; i < space.length; i++) {
                    list.add(space[i]);
                }
                totalWords++;
            }
            System.out.println("Words with their frequency..");
            Set<String> uniqueSet = new HashSet<String>(list);
            for (String word : uniqueSet) {
                System.out.println(word + ": " + Collections.frequency(list,word));
            }
        } catch (Exception e) {

            System.out.println("File not found");

        }

Answer 6

您可以使用Arrays.sort和Arrays.binarySearch改进简单的数组搜索。

基本上，对于每个单词，使用binarySearch检查它是否已经在您的数组中。如果是，请递增计数。如果不是，请将其添加到阵列并再次排序。当数组已经大部分排序时，当前的Java排序算法非常快。它使用TimSort。

还有其他结构，例如TreeSet，您可以使用它来避免使用散列，但我怀疑这也是不允许的。

计算文本文件中唯一单词的数量？（不允许使用哈希）

6 个答案:

计算文本文件中唯一单词的数量？ （不允许使用哈希）

6 个答案:

计算文本文件中唯一单词的数量？（不允许使用哈希）