Question

这就是我现在所拥有的。我想知道，我在.txt文件中有多少次。现在我正在尝试使用BufferedReader管理得不够好。我想这是解决这个问题的一种更简单的方法，但我不知道。

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.InputStreamReader;

public class TekstiAnalüsaator {

    public static void main(String[] args) throws Exception {
        InputStream baidid = new FileInputStream("test.txt");
        InputStreamReader tekst = new InputStreamReader(baidid, "UTF-8");
        BufferedReader puhverdab = new BufferedReader(tekst);
        String rida = puhverdab.readLine();
        while (rida != null){
            System.out.println("Reading: " + rida);
            rida = puhverdab.readLine();
        }
        puhverdab.close();
    }
}

我想使用这种结构搜索单词。什么文件，然后我需要找到的单词，（返回）多少次，这个单词在文件中。

TekstiAnalüsaator analüsaator = new TekstiAnalüsaator("kiri.txt");
int esinemisteArv = analüsaator.sõneEsinemisteArv("kala");

Answer 1

请参阅下面的代码示例。这应该可以解决你所面临的问题。

import java.io.*;

public class CountWords {

    public static void main(String args[]) throws IOException {
        System.out.println(count("Test.java", "static"));
    }

    public static int count(String filename, String wordToSearch) throws IOException {
        int tokencount = 0;
        FileReader fr = new FileReader(filename);
        BufferedReader br = new BufferedReader(fr);
        String s;
        int linecount = 0;

        String line;

        while ((s = br.readLine()) != null) {
            if (s.contains(wordToSearch))
                tokencount++;
            // System.out.println(s);

        }
        return tokencount;
    }

}

Answer 2

这是一个棘手的问题，因为在字符串中计算单词并不是那么简单的任务。你的方法可以逐行读取文件，所以现在的问题是如何计算单词匹配。

例如，你可以像这样简单地检查匹配：

    public static int getCountOFWordsInLine(String line, String test){
        int count=0;
        int index=0;
        while(line.indexOf(test,index ) != -1) {
            count++;
            index=line.indexOf(test,index)+1;

        }

        return count;
        }

这种方法的问题在于，如果你的单词是“test”而你的字符串是“下一个单词匹配asdfa test sdf”，它会将其视为匹配。所以你可以尝试使用一些更高级的正则表达式：

public static int getCountOFWordsInLine(String line, String word) {
    int count = 0;
    Pattern pattern = Pattern.compile("\\b"+word+"\\b");
    Matcher matcher = pattern.matcher(line);
    while (matcher.find())
        count++;
    return count;
}

它实际上会检查 \ b 包围的单词，即单词分隔

如果它以大写字母开头，它仍然无法找到。如果要使其不区分大小写，可以通过在搜索之前将所有内容更改为小写来修改先前的方法。但这取决于你对单词的定义。

整个计划将成为：

公共类MainClass {

public static void main(String[] args) throws InterruptedException {
    try {
            InputStream baidid = new FileInputStream("c:\\test.txt");
            InputStreamReader tekst = new InputStreamReader(baidid, "UTF-8");
            BufferedReader puhverdab = new BufferedReader(tekst);
            String rida = puhverdab.readLine();
            String word="test";
            int count=0;
            while (rida != null){
                System.out.println("Reading: " + rida);
                count+=getCountOFWordsInLine(rida,word );
                rida = puhverdab.readLine();

            }               
            System.out.println("count:"+count);
            puhverdab.close();

    }catch(Exception e) {
        e.printStackTrace();
    }
}

public static int getCountOFWordsInLine(String line, String test) {
    int count = 0;
    Pattern pattern = Pattern.compile("\\b"+test+"\\b");
    Matcher matcher = pattern.matcher(line);
    while (matcher.find())
        count++;
    return count;
}

}

Answer 3

import java.io.*;
import java.until.regex.*;

public class TA
{

    public static void main(String[] args) throws Exception
    {
        InputStream baidid = new FileInputStream("test.txt");
        InputStreamReader tekst = new InputStreamReader(baidid, "UTF-8");
        BufferedReader puhverdab = new BufferedReader(tekst);

        String rida;
        String word = argv[0];  // search word passed via command line
        int count1=0, count2=0, count3=0, count4=0;
        Pattern P1 = Pattern.compile("\\b" + word + "\\b");
        Pattern P2 = Pattern.compile("\\b" + word + "\\b", Pattern.CASE_INSENSITIVE);

        while ((rida = puhverdab.readLine()) != null)
        {
            System.out.println("Reading: " + rida);

            // Version 1 : counts lines containing [word]
            if (rida.contains(word)) count1++;

            // Version 2: counts every instance of [word]
            into pos=0;
            while ((pos = rida.indexOf(word, pos)) != -1) { count2++; pos++; }

            // Version 3: looks for surrounding whitespace
            Matcher m = P1.matcher(rida);
            while (m.find()) count3++; 

            // Version 4: looks for surrounding whitespace (case insensitive)
            Matcher m = P2.matcher(rida);
            while (m.find()) count4++;
        }
        System.out.println("Found exactly " + count1 + " line(s) containing word: \"" + word + "\"");
        System.out.println("Found word \"" + word + "\" exactly " + count2 + " time(s)");
        System.out.println("Found word \"" + word + "\" surrounded by whitespace " + count3 + " time(s).");
        System.out.println("Found, case insensitive search, word \"" + word + "\" surrounded by whitespace " + count4 + " time(s).");
        puhverdab.close();
    }
}

Answer 4

这会逐行读取，按空格分割一行以获取单个单词，并检查每个单词是否匹配。

int countWords(String filename, String word) throws Exception {
    InputStream inputStream = new FileInputStream(filename);
    InputStreamReader inputStreamReader = new InputStreamReader(inputStream, "UTF-8");
    BufferedReader reader = new BufferedReader(inputStreamReader);
    int count = 0;
    String line = reader.readLine();
    while (line != null) {
        String[] words = line.split("\\s+");
        for (String w : words)
            if (w.equals(word))
                count++;
        line = reader.readLine();
    }
    reader.close();
    return count;
}

BufferedReader（扫描仪）

4 个答案: