计算传入的字符流

时间:2016-03-18 15:04:10

标签: algorithm data-structures

在接受采访时我被问到了这个问题,虽然我在DS& Algo中表现不错,但这个我无法解决。这是一个有趣的问题,所以发布它。

问题:您有一个传入的字符流,您需要计算单词的出现次数。您只有一个API可以从流中读取,即stream.next_char(),它返回" \ 0"如果没有。

int count_occurrences(Stream stream, String word) {
// you have only one function provided from Stream class that you can use to 
// read one char at a time, no length/size etc.
// stream.next_char() - return "\0" if end
}

输入:" aabckjhabcc" 字:" abc" 输出:2

5 个答案:

答案 0 :(得分:0)

最简单的解决方案是使用带有最多word.length()符号的缓冲区:

static int count_occurrences(final Stream stream, final String word) {
    int found = 0;

    char c;
    String tmpWord = "";
    while ((c = stream.next_char()) != 0) {
        tmpWord += c;
        if (tmpWord.length() > word.length()) {
            tmpWord = tmpWord.substring(1);
        }
        if (tmpWord.equals(word)) {
            found++;
        }
    }
    return found;
}

复杂度为O(N * M),存储器O(M)

答案 1 :(得分:0)

int count_occurrences(Stream stream, String word) {
    int occurrence = 0;
    int current_index = 0;
    int word_length = word.length();
    char[] word_chars = word.toCharArray();

    char c = stream.next_char();
    while(c != '\0') {
        if( c == word_chars[current_index] ) {
            current_index++;
            if(current_index >= word_length) {
                occurrence++;
                current_index = 0;
            }
        }
        else {
            current_index = 0;
            if( c == word_chars[current_index] ) {
                current_index++;
            }
        }
        c = stream.next_char();

    }
    return occurrence;
}

答案 2 :(得分:0)

也许是这样的?

int count_occurrences(Stream stream, String word) {
    // you have only one function provided from Stream class that you can use to 
    // read one char at a time, no length/size etc.
    // stream.next_char() - return "\0" if end

    List<int> positions = new List<int>();

    int counter = 0;
    while (true) {
        char ch = stream.next_char();
        if (ch == '\0') return counter;

        if (ch == word.charAt(0)) {
            positions.add(0);
        }

        int i = 0;
        while (i < positions.length) {
            int pos = positions[i];

            if (word.charAt(pos) != ch) {
                positions.remove(i);
                continue;
            }

            pos++;
            if (pos == word.length()) {
                positions.remove(i);
                counter++;
                continue;
            }

            positions[i] = pos;
            i++;
        }
    }
}

答案 3 :(得分:0)

他们所寻找的(可能)是Rabin-Karp或Knuth-Morris-Pratt。两者都需要一次通过,开销很小。如果模式很大,那么速度就会明显胜过,因为复杂度为O(stream_length)

Rabbin-Karp依赖于哈希,您可以在O(1)中为下一个字符更新。如果散列不是很好或者流很长(散列冲突),可以给你误报。

Knuth-Morris-Pratt最重要的是计算最长前缀的长度,该前缀也是模式中每个位置的后缀。这需要O(n)内存来存储这些结果,但就是这样。

在字符串模式匹配下在维基百科中查找它们以获取更多详细信息和实现。

答案 4 :(得分:0)

我认为这个问题与...的想法有关 使用有限状态计算模型匹配字符串。

这个问题可以通过使用KMP字符串来解决 匹配算法。

KMP算法尝试在模式的文本字符串中查找匹配项 字符串,通过考虑模式的前缀多少 即使我们在某些时候发现不匹配,仍然会匹配。

用于确定&#34;仍然可以匹配多少前缀&#34;如果 在模式中匹配到索引i之后我们遇到不匹配 故障功能是事先建立的。 (请参阅以下代码 用于构建失效函数值)

这个失败函数将告诉模式的每个索引i, 即使是,仍然可以匹配多少模式的前缀 我们在索引i之后遇到不匹配。

这个想法是弄清楚最长的正确模式前缀的长度是多少 对于由1到i表示的模式的每个子串,它也是它的后缀 索引,其中i的范围从1到n。

我使用字符串索引从1开始。

因此,任何模式的第一个字符的失败函数值 为0(即到目前为止尚未匹配任何字符)。

对于后续字符,对于每个索引i = 2到n,我们看到了什么 是最长的 pattern [1 ... i]的子串的正确前缀也是 模式[1 ... i]的子串的后缀。

假设我们的模式是&#34; aac&#34;,然后是失败函数值 index 1为0(尚未匹配)和失败函数值 对于索引2是1,(最长的正确前缀的长度与...相同) 最长的适当后缀为&#34; aa&#34;是1)

对于模式&#34; ababac&#34;索引1的失败函数值为0, 索引2为0,索引3为1(作为第三个索引&#39; a&#39;与...相同 第一个索引&#39; a&#39;),索引4为2(&#34; ab&#34;索引1和2相同 as&#34; ab&#34;在指数3和4),指数5是3(&#34; aba&#34;指数[1 ... 3] 与&#34; aba&#34;相同在指数[3 ... 5])。对于索引6,失败函数值为0。

这是用于构建失败函数和匹配的代码(C ++) 使用它的文本(或流):

/* Assuming that string indices start from 1 for both pattern and text. */
/* Firstly build the failure function. */
int s = 1;
int t = 0;  

/* n denotes the length of the pattern */
int *f = new int[n+1];
f[1] = 0;   

for (s = 1; s < n; s++) {
    while (t > 0 && pattern[t + 1] != pattern[s + 1]) {
        t = f[t];
    }
    if (pattern[t + 1] == pattern[s + 1]) {
        t++;
        f[s + 1] = t;
    }
    else {
        f[s + 1] = 0;           
    }
}

/* Now start reading characters from the stream */
int count = 0;
char current_char = stream.next_char();

/* s denotes the index of pattern upto which we have found match in text */
/* initially its 0 i.e. no character has been matched yet. */
s = 0; 
while (current_char != '\0') {

    /* IF previously, we had matched upto a certain number of
       characters, and then failed, we return s to the point
       which is the longest prefix that still might be matched.

       (spaces between string are just for clarity)
       For e.g., if pattern is              "a  b  a  b  a  a" 
       & its failure returning index is     "0  0  1  2  3  1"

       and we encounter 
       the text like :      "x  y  z  a  b  a  b  a  b  a  a" 
              indices :      1  2  3  4  5  6  7  8  9  10 11

       after matching the substring "a  b  a  b  a", starting at
       index 4 of text, we are successful upto index 8  but we fail
       at index 9, the next character at index 9 of text is 'b'
       but in our pattern which should have been 'a'.Thus, the index
       in pattern which has been matched till now is 5 ( a  b  a  b  a)
                                                         1  2  3  4  5
       Now, we see that the failure returning index at index 5 of 
       pattern is 3, which means that the text is still matched upto
       index 3 of pattern (a  b  a), not from the initial starting 
       index 4 of text, but starting from index 6 of text.

       Thus we continue searching assuming that our next starting index
       in text is 6, eventually finding the match starting from index 6
       upto index 11.    

       */
        while (s > 0 && current_char != pattern[s + 1]) {
            s = f[s];
        }
        if (current_char == pattern[s + 1]) s++; /* We test the next character after the currently
                                                    matched portion of pattern with the current 
                                                    character of text , if it matches, we increase
                                                    the size of our matched portion by 1*/
        if (s == n) {
            count++;
        }
        current_char = stream.next_char();
}

printf("Count is %d\n", count);

`

注意:即使在重叠模式出现时,此方法也有助于查找计数。例如,单词&#34; aba&#34;两次出现 在溪流&#34; ababa&#34;。