在不使用数组的情况下计算字符串中的唯一单词

时间:2014-04-10 07:18:25

标签: java string

所以我的任务是编写一个程序来计算给定字符串中的单词数和唯一单词数,这些单词是我们从用户那里获得的,而不使用数组。 我可以完成第一项任务,并想知道如何进行第二部分。
用于计算我有的字符串中的单词数

boolean increment = false;
    for (int i = 0; i < inputPhrase.length(); i++){
           if(validChar(inputPhrase.charAt(i))) //validChar(char c) is a simple method that returns a valid character{
                   increment = true;
           }
           else if(increment){
                   phraseWordCount ++;
                   increment = false;
           }
    }
    if(increment) phraseWordCount++; //in the case the last word is a valid character

(原来我把它留了出来并被一个字关掉) 我可以用某种方式修改这个独特的单词吗?

3 个答案:

答案 0 :(得分:0)

使用Collections API,您可以使用以下方法计算单词:

private int countWords(final String text) {
    Scanner scanner = new Scanner(text);
    Set<String> uniqueWords = new HashSet<String>();

    while (scanner.hasNext()) {
        uniqueWords.add(scanner.next());
    }

    scanner.close();

    return uniqueWords.size();
}

如果您可以使用标点符号获得正常句子,则可以将第二行更改为:

Scanner scanner = new Scanner(text.replaceAll("[^0-9a-zA-Z\\s]", "").toLowerCase());

答案 1 :(得分:0)

这里建议如何在没有数组的情况下进行操作:

1)读取每个字符,直到找到空白并将此字符添加到第二个String 2)如果找到空白,请将其(或其他标记分隔单词)添加到第二个String
2a)读取第二个String中的每个单词,将其与输入String

中的当前单词进行比较
public static void main(String[] args) {
    final String input = "This is a sentence that is containing three times the word is";
    final char   token = '#';

    String processedInput  = "";
    String currentWord     = "";
    int    wordCount       = 0;
    int    uniqueWordCount = 0;

    for (char c : input.toCharArray()) {
        if (c != ' ') {
            processedInput += c;
            currentWord    += c;
        } else {
            processedInput += token;
            wordCount++;

            String  existingWord      = "";
            int     occurences        = 0;

            for (char c1 : processedInput.toCharArray()) {
                if (c1 != token) {
                    existingWord += c1;
                } else {
                    if (existingWord.equals(currentWord)) {
                        occurences++;
                    }

                    existingWord = "";
                }
            }

            if (occurences <= 1) {
                System.out.printf("New word: %s\n", currentWord);
                uniqueWordCount++;
            }

            currentWord = "";
        }
    }
    wordCount++;


    System.out.printf("%d words total, %d unique\n", wordCount, uniqueWordCount);
}

输出

New word: This
New word: is
New word: a
New word: sentence
New word: that
New word: containing
New word: three
New word: times
New word: the
New word: word
12 words total, 10 unique

答案 2 :(得分:0)

每次单词结束findUpTo检查单词是否包含在该单词开头之前的输入中。因此"if if if"将被视为一个唯一且总共三个单词。

/**
 * Created for http://stackoverflow.com/q/22981210/1266906
 */
public class UniqueWords {

    public static void main(String[] args) {
        String inputPhrase = "one two ones two three one";
        countWords(inputPhrase);
    }

    private static void countWords(String inputPhrase) {
        boolean increment = false;
        int wordStart = -1;
        int phraseWordCount = 0;
        int uniqueWordCount = 0;
        for (int i = 0; i < inputPhrase.length(); i++){
            if(validChar(inputPhrase.charAt(i))) { //validChar(char c) is a simple method that returns a valid character{
                increment = true;
                if(wordStart == -1) {
                    wordStart = i;
                }
            } else if(increment) {
                phraseWordCount++;
                final String lastWord = inputPhrase.substring(wordStart, i);
                boolean unique = findUpTo(lastWord, inputPhrase, wordStart);
                if(unique) {
                    uniqueWordCount++;
                }
                increment = false;
                wordStart = -1;
            }
        }
        if(increment) {
            phraseWordCount++; //in the case the last word is a valid character
            final String lastWord = inputPhrase.substring(wordStart, inputPhrase.length());
            boolean unique = findUpTo(lastWord, inputPhrase, wordStart);
            if(unique) {
                uniqueWordCount++;
            }
        }
        System.out.println("Words: "+phraseWordCount);
        System.out.println("Unique: "+uniqueWordCount);
    }

    private static boolean findUpTo(String needle, String haystack, int lastPos) {
        boolean previousValid = false;
        boolean unique = true;
        for(int j = 0; unique && j < lastPos - needle.length(); j++) {
            final boolean nextValid = validChar(haystack.charAt(j));
            if(!previousValid && nextValid) {
                // Word start
                previousValid = true;
                for (int k = 0; k < lastPos - j; k++) {
                    if(k == needle.length()) {
                        // We matched all characters. Only if the word isn't finished it is unique
                        unique = validChar(haystack.charAt(j+k));
                        break;
                    }
                    if (needle.charAt(k) != haystack.charAt(j+k)) {
                        break;
                    }
                }
            } else {
                previousValid = nextValid;
            }
        }
        return unique;
    }

    private static boolean validChar(char c) {
        return Character.isAlphabetic(c);
    }
}