Java字数统计程序

时间:2011-11-12 05:29:06

标签: java

我正在尝试制作一个关于字数的程序,我已经部分制作并且它给出了正确的结果但是当我输入空格或字符串中的多个空格时,字数的结果显示错误的结果,因为我我根据使用的空格计算单词。如果有一个解决方案,无论有多少空格,我仍然得到正确的结果,我需要帮助。我提到下面的代码。

public class CountWords 
{
    public static void main (String[] args)
    {

            System.out.println("Simple Java Word Count Program");

            String str1 = "Today is Holdiay Day";

            int wordCount = 1;

            for (int i = 0; i < str1.length(); i++) 
            {
                if (str1.charAt(i) == ' ') 
                {
                    wordCount++;
                } 
            }

            System.out.println("Word count is = " + wordCount);
    }
}

22 个答案:

答案 0 :(得分:17)

public static void main (String[] args) {

     System.out.println("Simple Java Word Count Program");

     String str1 = "Today is Holdiay Day";

     String[] wordArray = str1.trim().split("\\s+");
     int wordCount = wordArray.length;

     System.out.println("Word count is = " + wordCount);
}

这个想法是将字符串拆分成任意次数的任何空白字符上的单词。 String类的split函数返回一个包含单词作为其元素的数组。 打印数组的长度将产生字符串中的单词数。

答案 1 :(得分:13)

两条路线。一种方法是使用正则表达式。您可以找到有关正则表达式here的更多信息。一个很好的正则表达式就像“\ w +”然后计算匹配数。

如果你不想走那条路,你可以有一个布尔标志,记住你看到的最后一个字符是否是一个空格。如果是,请不要计算。所以循环的中心看起来像这样:

boolean prevCharWasSpace=true;
for (int i = 0; i < str1.length(); i++) 
{
    if (str1.charAt(i) == ' ') {
        prevCharWasSpace=true;
    }
else{
        if(prevCharWasSpace) wordChar++;
        prevCharWasSpace = false;

    }
}

<强> 更新
使用分割技术完全等同于此处发生的事情,但它并没有真正解释它为何起作用。如果我们回到我们的CS理论,我们想要构建一个计算单词的有限状态自动机(FSA)。 FSA可能表现为:
enter image description here
如果查看代码,它会完全实现此FSA。 prevCharWasSpace跟踪我们所处的状态,并且str1.charAt('i')决定遵循哪个边缘(或箭头)。如果使用split方法,则在内部构造等效于此FSA的正则表达式,并用于将字符串拆分为数组。

答案 2 :(得分:3)

您可以使用String.splitread more here)代替charAt,您将获得良好的效果。 如果您出于某种原因想要使用charAt,请在计算单词之前尝试trimming the string,这样就不会有额外的空格和额外的单词

答案 3 :(得分:3)

Java确实有StringTokenizer API,可以用于此目的。

String test = "This is a test app";
int countOfTokens = new StringTokenizer(test).countTokens();
System.out.println(countOfTokens);

OR

在一行中如下

System.out.println(new StringTokenizer("This is a test app").countTokens());

StringTokenizer支持输入字符串中的多个空格,仅计算修剪不必要空格的单词。

System.out.println(new StringTokenizer("This    is    a test    app").countTokens());

以上行也会打印5

答案 4 :(得分:1)

public class wordCOunt
{
public static void main(String ar[])
{
System.out.println("Simple Java Word Count Program");

    String str1 = "Today is Holdiay Day";

    int wordCount = 1;

    for (int i = 0; i < str1.length(); i++) 
    {
        if (str1.charAt(i) == ' '&& str1.charAt(i+1)!=' ') 
        {
            wordCount++;
        } 
    }

    System.out.println("Word count is = " +(str1.length()- wordCount));
}

}

答案 5 :(得分:0)

我的实现,不使用StringTokenizer:

Map<String, Long> getWordCounts(List<String> sentences, int maxLength) {
    Map<String, Long> commonWordsInEventDescriptions = sentences
        .parallelStream()
        .map(sentence -> sentence.replace(".", ""))
        .map(string -> string.split(" "))
        .flatMap(Arrays::stream)
        .map(s -> s.toLowerCase())
        .filter(word -> word.length() >= 2 && word.length() <= maxLength)
        .collect(groupingBy(Function.identity(), counting()));
    }

然后,您可以这样称呼它,例如:

getWordCounts(list, 9).entrySet().stream()
                .filter(pair -> pair.getValue() <= 3 && pair.getValue() >= 1)
                .findFirst()
                .orElseThrow(() -> 
    new RuntimeException("No matching word found.")).getKey();

也许翻转方法以返回Map<Long, String>可能会更好。

答案 6 :(得分:0)

仅计算特定单词,例如John,John99,John_John和仅John。根据您自己更改正则表达式,并仅计算指定的单词。

    public static int wordCount(String content) {
        int count = 0;
        String regex = "([a-zA-Z_’][0-9]*)+[\\s]*";     
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(content);
        while(matcher.find()) {
            count++;
            System.out.println(matcher.group().trim()); //If want to display the matched words
        }
        return count;
    }

答案 7 :(得分:0)

    public class TotalWordsInSentence {
    public static void main(String[] args) {

        String str = "This is sample sentence";
        int NoOfWOrds = 1;

        for (int i = 0; i<str.length();i++){
            if ((str.charAt(i) == ' ') && (i!=0) && (str.charAt(i-1) != ' ')){
                NoOfWOrds++;
            }
        }
         System.out.println("Number of Words in Sentence: " + NoOfWOrds);
    }
}

在此代码中,其中的空白不会有任何问题。
只是简单的for循环。希望这会有所帮助...

答案 8 :(得分:0)

这可以像使用split和count变量一样简单。

public class SplitString {

    public static void main(String[] args) {
        int count=0;        
        String s1="Hi i love to code";

        for(String s:s1.split(" "))
        {
            count++;
        }
        System.out.println(count);
    }
}

答案 9 :(得分:0)

不确定是否有缺点,但这对我有用......

    Scanner input = new Scanner(System.in);
    String userInput = input.nextLine();
    String trimmed = userInput.trim();
    int count = 1;

    for (int i = 0; i < trimmed.length(); i++) {
      if ((trimmed.charAt(i) == ' ') && (trimmed.charAt(i-1) != ' ')) {
        count++;
      }
    }

答案 10 :(得分:0)

计算总字数或计算总字数而不重复字数

public static void main(String[] args) {
    // TODO Auto-generated method stub
    String test = "I am trying to make make make";
    Pattern p = Pattern.compile("\\w+");
    Matcher m = p.matcher(test);
    HashSet<String> hs =  new HashSet<>();
    int i=0;
    while (m.find()) {
        i++;
        hs.add(m.group());
    }
    System.out.println("Total words Count==" + i);
    System.out.println("Count without Repetation ==" + hs.size());
}

}

输出:

总字数== 7

不重复计数== 5

答案 11 :(得分:0)

试试这个

import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class wordcount {
    public static void main(String[] args) {
        String s = "India is my country. I love India";
        List<String> qw = new ArrayList<String>();
        Map<String, Integer> mmm = new HashMap<String, Integer>();
        for (String sp : s.split(" ")) {
            qw.add(sp);
        }
        for (String num : qw) {
            mmm.put(num, Collections.frequency(qw, num));
        }
        System.out.println(mmm);

    }

}

答案 12 :(得分:0)

    String data = "This world is mine";
    System.out.print(data.split("\\s+").length);

答案 13 :(得分:0)

public static int CountWords(String str){

   if(str.length() == 0)
          return 0;

   int count =0;
   for(int i=0;i< str.length();i++){


      if(str(i) == ' ')
          continue;

      if(i > 0 && str.charAt(i-1) == ' '){
        count++;
      } 

      else if(i==0 && str.charAt(i) != ' '){
       count++;
      }


   }
   return count;

}

答案 14 :(得分:0)

import com.google.common.base.Optional;
import com.google.common.base.Splitter;
import com.google.common.collect.HashMultiset;
import com.google.common.collect.ImmutableSet;
import com.google.common.collect.Multiset;

String str="Simple Java Word Count count Count Program";
Iterable<String> words = Splitter.on(" ").trimResults().split(str);


//google word counter       
Multiset<String> wordsMultiset = HashMultiset.create();
for (String string : words) {   
    wordsMultiset.add(string.toLowerCase());
}

Set<String> result = wordsMultiset.elementSet();
for (String string : result) {
    System.out.println(string+" X "+wordsMultiset.count(string));
}

答案 15 :(得分:0)

  

你应该通过考虑其他单词分隔符来使你的代码更通用..例如“,”“;”等

public class WordCounter{
    public int count(String input){
        int count =0;
        boolean incrementCounter = false;
        for (int i=0; i<input.length(); i++){
            if (isValidWordCharacter(input.charAt(i))){
                incrementCounter = true;
            }else if (incrementCounter){
                count++;
                incrementCounter = false;
            }
        }
        if (incrementCounter) count ++;//if string ends with a valid word
        return count;
    }
    private boolean isValidWordCharacter(char c){
        //any logic that will help you identify a valid character in a word
        // you could also have a method which identifies word separators instead of this
        return (c >= 'A' && c<='Z') || (c >= 'a' && c<='z'); 
    }
}

答案 16 :(得分:0)

public class wordCount
{
public static void main(String ar[]) throws Exception
{
System.out.println("Simple Java Word Count Program");


    int wordCount = 1,count=1;
 BufferedReader br = new BufferedReader(new FileReader("C:/file.txt"));
            String str2 = "", str1 = "";

            while ((str1 = br.readLine()) != null) {

                    str2 += str1;

            }


    for (int i = 0; i < str2.length(); i++) 
    {
        if (str2.charAt(i) == ' ' && str2.charAt(i+1)!=' ') 
        {
            wordCount++;
        } 


        }

    System.out.println("Word count is = " +(wordCount));
}

}

答案 17 :(得分:0)

 public class CountWords 
    {
        public static void main (String[] args)
        {
            System.out.println("Simple Java Word Count Program");
            String str1 = "Today is Holdiay Day";
            int wordCount = 1;
            for (int i = 0; i < str1.length(); i++) 
            {
                if (str1.charAt(i) == ' ' && str1.charAt(i+1)!=' ') 
                {
                    wordCount++;
                } 
            }
            System.out.println("Word count is = " + wordCount));
        }
    }   

这给出了正确的结果,因为如果空间出现两次或更多,那么它就无法增加wordcount。享受。

答案 18 :(得分:0)

您需要逐行读取文件,并将行中出现的空格的多次出现减少到单个出现,然后计算单词。以下是一个示例:

public static void main(String... args) throws IOException {   

    FileInputStream fstream = new FileInputStream("c:\\test.txt");
    DataInputStream in = new DataInputStream(fstream);
    BufferedReader br = new BufferedReader(new InputStreamReader(in));
    String strLine;
    int wordcount = 0;
    while ((strLine = br.readLine()) != null)   {
        strLine = strLine.replaceAll("[\t\b]", "");
        strLine = strLine.replaceAll(" {2,}", " ");
        if (!strLine.isEmpty()){
            wordcount = wordcount + strLine.split(" ").length;
        }
    }

    System.out.println(wordcount);
    in.close();
}

答案 19 :(得分:0)

您可以使用此代码。它可以帮助您:

public static void main (String[] args)
{

   System.out.println("Simple Java Word Count Program");

   String str1 = "Today is Holdiay Day";
   int count=0;
   String[] wCount=str1.split(" ");

   for(int i=0;i<wCount.length;i++){
        if(!wCount[i].isEmpty())
        {
            count++;
        }
   }
   System.out.println(count);
}

答案 20 :(得分:0)

使用split(regex)方法。结果是一个由regex分割的字符串数组。

String s = "Today is Holdiay Day";
System.out.println("Word count is = " + s.split(" ").length);

答案 21 :(得分:-2)

完整的计划是:

public class main {

    public static void main(String[] args) {

        logicCounter counter1 = new logicCounter();
        counter1.counter("I am trying to make a program on word count which I have partially made and it is giving the correct result but the moment I enter space or more than one space in the string, the result of word count show wrong results because I am counting words on the basis of spaces used. I need help if there is a solution in a way that no matter how many spaces are I still get the correct result. I am mentioning the code below.");
    }
}

public class logicCounter {

    public void counter (String str) {

        String str1 = str;
        boolean space= true;
        int i;

        for ( i = 0; i < str1.length(); i++) {

            if (str1.charAt(i) == ' ') {
                space=true;
            } else {
                i++;
            }
        }

        System.out.println("there are " + i + " letters");
    }
}