枚举一个阶段中的所有n-gram子词

时间:2010-08-12 02:37:48

标签: algorithm

任何可以处理此问题的现有功能? 输入:A B C. 输出:{A},{B},{C},{A B},{B C},{A B C}

请注意{A C}或{C A}无效输出。

3 个答案:

答案 0 :(得分:3)

在伪代码中:

for (i=0 .. n-1) {
    for (j=i .. n-1) {
        ngrams.add(phase[i:j])
    }
}

phase[i:j]是从i开始到j结束的切片,n是长度(在本例中为3)

A B C 
0 1 2

0:0 A
0:1 AB
0:2 ABC
1:1 B
1:2 BC
2:2 C

答案 1 :(得分:1)

我想通了:O(n ^ 3)算法

public static void GenerateAllGrams(string query) {
        string[] q = query.Split(' ');
        int maxgram = q.Length;
        for (int gram = 1; gram <= maxgram; gram++) {
            for (int i = 0; i < q.Length - gram + 1; i++) {
                string current = "";
                for (int j = i; j < i + gram; j++) {
                    current += q[j] + " ";
                }
                Console.WriteLine(current.Trim());
            }
        }
    }

答案 2 :(得分:1)

在计划中:

(define (prefix x list)
    (if (null? list)
        nil
        (cons (cons x (car list))
              (prefix x (cdr list)))))

(define (subwords phrase)
    (if (null? phrase)
        nil
        (cons (list (car phrase))
              (cons (prefix (car phrase) (subwords (cdr phrase)))
                    (subwords (cdr phrase))))))