字长度频率

时间:2015-07-09 10:53:57

标签: java eclipse text-files word-frequency

我在Eclipse中创建了一个Java程序。该程序计算每个单词的频率。例如,如果用户输入“我去了商店”。该程序将产生输出' 1 1 1 2'这是1个长度为1的单词(' I')1个长度为2的单词('到')1个长度为3的单词('')和2个单词长度为4('去'' shop')。

我已创建此程序以读取用户输入的字符串,但我想调整代码以读取文本文件的每一行。任何帮助都会很棒。

import java.util.Scanner;

public class WordLengthFrequency
{

    public static void main(String[] args)
    {
        Scanner scan = new Scanner(System.in);

        while (true)
        {
            System.out.println("Enter text: ");

            String s;
            s = scan.nextLine();
            String input = s;
            String strippedInput = input.replaceAll("\\W", " ");

            System.out.println("" + strippedInput);

            String[] strings = strippedInput.split(" ");
            int[] counts = new int[6];
            int total = 0;
            for (String str : strings)
                if (str.length() < counts.length)
                    counts[str.length()] += 1;
            for (String s1 : strings)
                total += s1.length();   
            for (int i = 1; i < counts.length; i++){    
                StringBuilder sb = new StringBuilder(i).append(i + " letter words: ");
                for (int j = 1; j <= counts[i]; j++) {
                    sb.append('*');
                    System.out.println(i + " letter words: " + counts[i]);
                    System.out.println(sb);
                    System.out.println(("mean lenght: ") + ((double) total / strings.length));
                }
            }
       }
    }
}

2 个答案:

答案 0 :(得分:0)

Scanner scan = new Scanner(System.in);

此代码创建一个扫描system.in以查找要读取的内容的扫描程序。 System.in通常是控制台。相反,您想要从其他地方读取,因此您需要将扫描仪指向所需的文本。

这可以通过

轻松完成
Scanner scan = new Scanner(new File("filePath"));

您还需要更改循环,因为您不能再继续(文件,不像控制台输入,最终结束)。扫描仪有一个很好的小方法,hasNext(),它会告诉你它是否有更多行可供阅读。

答案 1 :(得分:0)

首先,一点代码格式化可以使可读性产生巨大差异。此外,为了阅读文件,我建议使用BufferedReader。在这种情况下,我建议使用HashMap。目前,由于您使用的是具有有限索引的列表,因此您将被限制为可以跟踪的单词长度。使用地图,您可以跟踪任何数量的单词长度。像下面这样的东西会很好。

public static void main(String[] args) {
    HashMap<Integer, Integer> lengthCount = new HashMap<Integer, Integer>();
    BufferedReader br;
    try {
        String currentLine;
        br = new BufferedReader(new FileReader("text.txt"));

        // Gets new line, if it is the end of the file, it ends
        int totalNumberWords = 0;
        while ((currentLine = br.readLine()) != null) {
            String[] words = currentLine.split(" ");
            totalNumberWords += words.length;

            // Iterates through the words in the line and
            // increments the map appropriately
            for (String word : words) {
                int currentNumber = 0;
                if (lengthCount.get(word.length()) != null)
                    currentNumber = lengthCount.get(word.length());
                lengthCount.put(word.length(), currentNumber + 1);
            }
        }

        // Iterates through the map and prints the amount of strings
        // for each length and the percent of words with each length
        for (Map.Entry<Integer, Integer> curEntry : lengthCount.entrySet()) {
            double percentWithThisLength = ((double) curEntry.getValue() / totalNumberWords) * 100;
            System.out.print(curEntry.getValue() + " string(s) with length " + curEntry.getKey());
            System.out.println(" (" + percentWithThisLength + "%)");
        }

        br.close();
    } catch (IOException e) {
        System.out.println("Could not find specified file");
    }
}

text.txt包含的内容:

  

Lorem ipsum dolor sit amet,consectetur adipiscing elit,sed do   eiusmod tempor incididunt ut labore et dolore magna aliqua。耶

产生

3 string(s) with length 2 (15.0%)
3 string(s) with length 3 (15.0%)
6 string(s) with length 5 (30.0%)
3 string(s) with length 6 (15.0%)
2 string(s) with length 7 (10.0%)
2 string(s) with length 10 (10.0%)
1 string(s) with length 11 (5.0%)