天真的字符串搜索算法 - Python

时间:2016-12-17 13:26:58

标签: python string algorithm

我已经实现了以下Naive String搜索算法('find')。这很简单。然而,我在'GeeksforGeeks'上发现了另一种方式,('search')它看起来有更好的复杂性。当我测试它的大字符串时,结果是截然不同的,但相反。

1st:将字符串切成图案长度并进行比较。向前移动一个字符切片并进行比较。这应该是什么复杂性?

def find(pat, txt):
    size = len(pat)
    for i in range( len(txt) -size + 1 ):
        if txt[i : i + size] == pat:
            print 'Pattern found at index %s'%(i)

第二:逐字符比较。如果一个角色不符合休息。其他继续。最后,如果匹配的所有字符都打印结果。向前移动一个角色。这应该是什么复杂性?

def search(pat, txt):
    M = len(pat)
    N = len(txt)

    for i in xrange(N-M+1):
        status = 1 
        for j in xrange(M):
            if txt[i+j] != pat[j]:
                status = 0
                break
        if j == M-1 and status != 0:
            print "Pattern found at index " + str(i)

时间测试案例:

testString = ''.join([ 'a' for _ in range(1000*100)] ) + 'b'
testPattern = ''.join([ 'a' for _ in range(100*100) ])  + 'b'

import cProfile
cProfile.run('find(testPattern, testString)')
cProfile.run('search(testPattern, testString)')

代表find

Pattern found at index 90000
         90007 function calls in 0.160 seconds

search

Pattern found at index 90000
         5 function calls in 135.951 seconds

在我的算法中find我做切片和比较。切片的时间复杂度为O(k),类似地,为了比较,它应该采用另一个O(k)但不确定。 Python Time Complexity

search中,我们只运行'k'次循环。所以不应该有更好的时间复杂性。

2 个答案:

答案 0 :(得分:5)

你的两个算法基本相同(如@Zah所指出的),唯一的区别是第二个算法中的内部循环是由第一个算法中的底层C代码完成的。您正在观察的是编译代码和解释代码之间的区别。

如果您想要所有索引并想要利用内置方法:

*Main> takeWhile (<10) primes
[1,2

例如,

def findAll(s,t):
    """returns all indices where substring t occurs in string s"""
    indices = []
    i = s.find(t)
    while i > -1:
        indices.append(i)
        i = s.find(t,i+1)
    return indices

答案 1 :(得分:0)

我在 PYTHON 中实现了简单的搜索代码,如下所示。 它将返回没有找到的时间模式。

def naive_pattern_search(data,search):
n = len(data) #Finding length of data
m = len(search) #Finding length of pattern to be searched.

i = 0
count = c = 0 #Taking for counting pattern if exixts.
for j in range(m-1):#Loop continue till length of pattern to be Search.
    while i <= (n-1):#Data loop
        
        #if searched patten length reached highest index at that time again initilize with 0.
        if j > (m-1):
            j = 0
        
        #Data and search have same element then both Index increment by 1.
        if data[i]==search[j]:
            #print(f"\n{ data[i] } { search[j] }")
            #print(f"i : {i}  {data[i]}   j : {j}  {search[j]}")
            i+=1
            j+=1
            count+=1
            
            #If one pattern compared and found Successfully then Its Counter for pattern.
            if count== (m-1):
                c = c + 1
        #Initilise pattern again with 0 for searching with next element in data.
        else:
            j = 0 #Direct move to 0th index.
            i+=1
            count=0 #If data not found as per pattern continuously then it will start counting from 0 again.

#Searched pattern occurs more then 0 then its Simply means that pattern found.
if c > 0:
    return c;
else:
    return -1;