如何在C / C ++中检查字符串中的模式?

时间:2015-05-25 20:11:28

标签: c++ c string algorithm pattern-matching

如果我有一个像abcabcabc这样的字符串,那么显然,abc是一种模式。我想用c / c ++找出模式。

我不想要实施。伪代码/算法就可以了。

我该怎么办?

3 个答案:

答案 0 :(得分:1)

使用 floyd循环查找算法。这使用慢速类比来查找循环。维基百科提供的Python源代码:

def floyd(f, x0):
    # Main phase of algorithm: finding a repetition x_i = x_2i
    # The hare moves twice as quickly as the tortoise and
    # the distance between them increases by 1 at each step.
    # Eventually they will both be inside the cycle and then,
    # at some point, the distance between them will be
    # divisible by the period λ.
    tortoise = f(x0) # f(x0) is the element/node next to x0.
    hare = f(f(x0))
    while tortoise != hare:
        tortoise = f(tortoise)
        hare = f(f(hare))

    # At this point the tortoise position, ν, which is also equal
    # to the distance between hare and tortoise, is divisible by
    # the period λ. So hare moving in circle one step at a time, 
    # and tortoise (reset to x0) moving towards the circle, will 
    # intersect at the beginning of the circle. Because the 
    # distance between them is constant at 2ν, a multiple of λ,
    # they will agree as soon as the tortoise reaches index μ.

    # Find the position μ of first repetition.    
    mu = 0
    tortoise = x0
    while tortoise != hare:
        tortoise = f(tortoise)
        hare = f(hare)   # Hare and tortoise move at same speed
        mu += 1

    # Find the length of the shortest cycle starting from x_μ
    # The hare moves one step at a time while tortoise is still.
    # lam is incremented until λ is found.
    lam = 1
    hare = f(tortoise)
    while tortoise != hare:
        hare = f(hare)
        lam += 1

    return lam, mu

此解决方案的时间复杂度为O(λ, μ),辅助空间为O(1)

答案 1 :(得分:0)

尝试查找:http://en.wikipedia.org/wiki/Cycle_detection 不要把它想象成一个字符串,而是找一个句号。它是否是一个字符串并不重要。

答案 2 :(得分:0)

找出一个模式的一种方法是使用Knuth-Morris-Pratt's algorithm的预计算算法,其时间复杂度为O(P.length),其中P是给定字符串,用于计算查找表< strong>&#39; PI&#39; ,其中包含与其相应前缀匹配的最长后缀的长度(&#34; a&#34;,&#34; ab&#34;,&#34; abc&#34;,...)。

enter image description here

伪代码取自算法导论,CLRS。此外, Linux有一个不错的implementation上述算法。

因此, P.length-PI [P.length] = k =最小重复模式的长度。请记住,k将始终保持在[0,P.length]范围内。

例如,&#34; abcabcabc&#34; = PI [0,0,0,1,2,3,4,5,6]。这里,最小重复模式的长度为9 - 6 = 3.但是k是否将字符串平均分配?

因此,如果P.length mod k == 0? P [1..k]将是你的重复模式。