Question

我正在为学校做一些代码问题。试图将其保持在我的逻辑中（并且本质上是失败的）。只是想知道是否有任何关于使这项工作的提示;

public static String[][] sortWords(BufferedReader in, int n) throws IOException{
    String line = "";
    int ctr = 0;
    String[][] words = new String[n][2];

    for(int m = 0; m < n; m++) {
        words[m][1] = "1"; 
    }

    while((line=in.readLine())!=null) {
        String a[]=line.split(" ");    
        for(int i = 0; i < a.length; i++) {
            a[i] = a[i].toUpperCase();
            for(int h = ctr; h < n; h++) {
                if (words[h][0].equals(a[i])) {
                    words[h][1] = "" + (Integer.parseInt(words[h][1])+1);
                } else{
                    words[ctr][0] = a[i];
                    ctr++;
                    break;
                }
            }
        } 
        line=in.readLine();
    }
    return words;
}

我要做的是采用一个非常大的文本文件（70k字）并解析这个。我想的这种方法可以做到以下几点; - 找到文件中的所有单词 - 查找每个单词的出现次数 - 将这两个值存储在2D数组中，以便于访问。

如果我不在基地，我明白了。提前谢谢。

Answer 1

所有评论都是重点，但我会尝试将它们翻译成代码。在每个步骤中，我都注释掉了每条未修改的行，以便更改更清晰。

首先，螺旋那个二维阵列。与之合作具有限制性和繁琐性。让我们改用Map：

public static Map<String, Integer> sortWords(BufferedReader in) throws IOException{
//    String line = "";
    Map<String, Integer> wordsCount = new HashMap<>();
//
//    while((line=in.readLine())!=null) {
//        String a[]=line.split(" ");
//        for(int i = 0; i < a.length; i++) {
//            a[i] = a[i].toUpperCase();
            Integer count = wordsCount.get(a[i]); // Get current count for this word
            if (count == null) count = 0; // Initialize on first appearance
            count++; // Update counter
            wordsCount.put(a[i], count); // Save the updated value
//        }
//        line=in.readLine();
//    }
//    return words;
//}

无需初始化数组，无需额外循环，无String到int转换...只需获取与该字相关联的值并进行更新即可。现在我们不需要事先知道单词的数量，因此可以安全地删除第二个int n参数！

现在，我发现你使用了一个非常基本的，类似C的，2000年之前的习语（包括所有for(;;)和数组等等）。它完全有效，但你错过了更现代和更有用的结构。那么我们如何使用自2004年以来可用的 enhanced for loop ？

//public static Map<String, Integer> sortWords(BufferedReader in) throws IOException{
//    String line = "";
//    Map<String, Integer> wordsCount = new HashMap<>();
//
//    while((line=in.readLine())!=null) {
//        String a[]=line.split(" ");
        for(String word : a) {
            word = word.toUpperCase();
            Integer count = wordsCount.get(word); // Get current count for this word
//            if (count == null) count = 0; // Initialize on first appearance
//            count++; // Update counter
            wordsCount.put(word, count); // Save the updated value
//        }
//        line=in.readLine();
//    }
//    return wordsCount;
//}

语法更清晰，我们确切知道我们在循环中处理的对象类型...最重要的是，它可以让您内联一些代码，使其更清晰。像这样：

//public static Map<String, Integer> sortWords(BufferedReader in) throws IOException{
//    String line = "";
//    Map<String, Integer> wordsCount = new HashMap<>();
//
//    while((line=in.readLine())!=null) {
        for(String word : line.toUpperCase().split(" ")) {
//            Integer count = wordsCount.get(word); // Get current count for this word
//            if (count == null) count = 0; // Initialize on first appearance
//            count++; // Update counter
//            wordsCount.put(word, count); // Save the updated value
//        }
//        line=in.readLine();
//    }
//    return wordsCount;
//}

现在toUpperCase()方法每行只调用一次，而不是每个单词调用一次，我们摆脱了那个伤害每个人眼睛的String a[]; -P

最后要做的就是最后删除额外的readLine()。这样做，现在你的代码应该是这样的：

public static Map<String, Integer> sortWords(BufferedReader in) throws IOException {
    String line = "";
    Map<String, Integer> wordsCount = new HashMap<>();

    while ((line = in.readLine()) != null) {
        for(String word : line.toUpperCase().split(" ")) {
            Integer count = wordsCount.get(word); // Get current count for this word
            if (count == null) count = 0; // Initialize on first appearance
            count++; // Update counter
            wordsCount.put(word, count); // Save the updated value
        }
    }
    return wordsCount;
}

好多了！你可以使用这样的方法：

BufferedReader in = new BufferedReader(new FileReader("myWords.txt"));
Map words = sortWords(in);
int numberOfHellos = words.get("Hello");
int numberOfGreetings = numberOfHellos + words.get("Hi") + words.get("Howdy");

使用BufferedReader到包含计数器的二维数组

1 个答案: