在没有内置函数的情况下在Python中查找字符串的子字符串

时间:2015-10-19 14:41:12

标签: python string substring

我试图编写一些代码来查找字符串中的子字符串。到目前为止,我有这个:

main = "dedicated"
sub = "cat"
count = 0
for i in range (0,len(main)):
   match = True
   if sub[0]==main[i]:
     j=0
     for j in range(0,len(sub)):
         if sub[j]!=main[i+j]:
             match = False
             print "No substring"
             break
         else:
             count=count+1
             if match == True and count == len(sub):
                 print "Substring"
                 print "Position start:",i
  • "奉献"和" cat"原理
  • "这是一个例子"和"示例"返回IndexError
  • "无"和"不同"什么都不返回

任何人都可以帮助我/给我指点/改进代码,以便它与上面的要点一致吗?

8 个答案:

答案 0 :(得分:1)

def index(s, sub):
    start = 0
    end = 0
    while start < len(s):
        if s[start+end] != sub[end]:
            start += 1
            end = 0
            continue
        end += 1
        if end == len(sub):
            return start
    return -1

输出:

>>> index("dedicate", 'cat')
4
>>> index("this is an example", 'example')
11
>>> index('hello world', 'word')
-1

答案 1 :(得分:0)

要解决您的问题,请添加以下内容:

main = "this is an example"
sub = "example"
count = 0
done = False
for i in range (0,len(main)):
   match = True
   if sub[0]==main[i]:
     j=0
     for j in range(0,len(sub)):
         if sub[j]!=main[i+j]:
             match = False
             print "No substring"
             break
         else:
             count=count+1
             if match == True and count == len(sub):
                 print "Substring"
                 print "Position start:",i
                 done = True
                 break
   if done == True:
     break

最后注意,你已经完成..所以然后设置它以一个变量结束程序,并打破循环。然后突破外循环。

然而,您确实需要解决潜艇可能会尝试并超过主要长度的问题,例如

main = "this is an example"
sub = "examples"

在这种情况下,您需要检查j迭代器是否超出范围。我会把它留给你弄清楚,因为它不是原始问题的一部分。

答案 2 :(得分:0)

s1="gabcdfahibdgsabc hi kilg hi"
s2="hi"
count=0
l2=len(s2)
for i in range(len(s1)):
    if s1[i]==s2[0]:   
        end=i+l2
        sub1=s1[i:end]
        if s2==sub1:
            count+=1
print (count)

答案 3 :(得分:0)

def find_sub_str(sample, sub_str):
    count = 0
    for index in range(len(sample)):
        nxt_len = index + len(sub_str)
        if sample[index:nxt_len] == sub_str:
            count += 1
        print("Sub string present {} position start at index 
        {}".format(sample[index:nxt_len], index))
    print("no of times subsring present: ", count)

find_sub_str(“ dedicate”,“ cat”)
子字符串当前的猫位置从索引4开始
没有出现的次数:1

find_sub_str(“无”,“与众不同”)
没有出现的次数:0

find_sub_str(“这是一个例子”,“ example”)
子字符串的当前位置示例从索引11开始
没有出现的次数:1

答案 4 :(得分:0)

考虑输入以下内容

sub_str = "hij"
input_strs = "abcdefghij"

此处的逻辑是-

按主字符串的顺序排列来检查字符串是否为子字符串 从0到字符串的结尾。

Iterations are like following - 
Iteration 1:  abc
Iteration 2:  bcd
Iteration 3:  cde
Iteration 4:  def
Iteration 5:  efg
Iteration 6:  fgh
Iteration 7:  ghi
Iteration 8:  hij

当Main字符串的长度为8且Sub字符串的长度为3时,最多需要8次迭代。

复杂度

Worst case complexity = LenOfMainString - LenOfSubString + 1
Best case complexity = 0 when LenOfSubString is greater than LenOfMainString

注意:这是用于查找在主字符串中是否存在给定字符串的代码,而不是子字符串。不是获取索引,而是如果匹配则打印索引,否则打印-1

代码

def is_sub_string(main_str, sub_str):
    """
    @Summary: Check string is sub string of main or not
    @Param main_str(String): Main string in which we have to check sub string is
     present or not.
    @Param sub_str(String): String which we want to check if present in main
     string or not.
    @Return (Boolean): True if present else False.
    """
    # Length of main string and sub string
    # We will iterate over main string is input_str_len - sub_len + 1
    # Means if main string have 10 characters and sub string have 3 characters
    # then in worst case if have to iterate 8 time because last two character
    # can not be sub string, as sub string length is 3
    sub_len = len(sub_str)
    input_str_len = len(main_str)
    index = 0
    is_sub_string = False
    while index<input_str_len-sub_len+1:
        # Check sub_str is equal to sequential group of same characters in main
        # string.
        if sub_str==main_str[index:index+sub_len]:
            is_sub_string = True
            break
        # Increase index count by one to move to next character. 
        index += 1
    print("Total Iteration:", index + 1 if is_sub_string else index, end="\t")
    print("Is Substring:", is_sub_string, end="\t")
    print("Index:",  index if is_sub_string else -1)
    return is_sub_string

输出

情况01 :当字符串出现在主字符串的开头。

status = is_sub_string("abcdefghij", "abc")
>> Total Iteration: 1      Is Substring: True      Index: 0

情况02 :当字符串出现在主字符串的末尾。

status = is_sub_string("abcdefghij", "hij")
>> Total Iteration: 8      Is Substring: True      Index: 7

情况03 :主字符串中不存在字符串。

status = is_sub_string("abcdefghij", "hix")
>>Total Iteration: 8      Is Substring: False     Index: -1

情况04 :当字符串长度大于主字符串时。

status = is_sub_string("abcdefghij", "abcdefghijabcdefghij")
>>Total Iteration: 0      Is Substring: False     Index: -1

OR

如果我们从头到尾都搜索字符串,则可以将迭代次数减少一半。

复杂度

Worst case complexity = (LenOfMainString - LenOfSubString + 1)/2
Best case complexity = 0 when LenOfSubString is greater than LenOfMainString

代码

def is_sub_string(main_str, sub_str):
    """
    @Summary: Check string is sub string of main or not
    @Param main_str(String): Main string in which we have to check sub string is
     present or not.
    @Param sub_str(String): String which we want to check if present in main
     string or not.
    @Return (Boolean): True if present else False.
    """
    # Length of main string and sub string
    # We will iterate over main string is (main_str_len - sub_len + 1)/2
    sub_len = len(sub_str)
    input_str_len = len(main_str)
    index = 0
    is_sub_string = False
    find_index = -1
    while index<(input_str_len-sub_len+1)/2:
        # Check sub_str is equal to sequential group of same characters in main
        # string.
        if sub_str==main_str[index:index+sub_len]:
            is_sub_string = True
            find_index = index
            break
        print((index+sub_len)*-1, input_str_len-index, end="\t")
        print(main_str[(index+sub_len)*-1:input_str_len-index], main_str[index:index+sub_len])
        if sub_str==main_str[(index+sub_len)*-1:input_str_len-index]:
            is_sub_string = True
            find_index = (index+sub_len-input_str_len) * (-1)
            break
        # Increase index count by one to move to next characters. 
        index += 1
    print("Total Iteration:", index + 1 if is_sub_string else index, end="\t")
    print("Is Substring:", is_sub_string, end="\t")
    print("Index:",  find_index)
    return is_sub_string

答案 5 :(得分:0)

t = pd.DataFrame(np.random.choice([2,3], (100000, 10))).add_prefix('v_')
print (t)
In [30]: %timeit pd.Series(t.where(t.eq(3)).stack().droplevel(0).index)
84.7 ms ± 1.41 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [31]: %timeit pd.Series(t.where(t.eq(3)).stack().reset_index(0, drop=True).index)
84.1 ms ± 459 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

答案 6 :(得分:0)

我认为最干净的方法如下:

string = "samitsaxena"
sub_string = "sa"

sample_list =[]

for i in range(0, len(string)-len(sub_string)+1):
   sample_list.append(string[i:i+len(sub_string)])

print(sample_list)
print(sample_list.count(sub_string))

输出如下:

['sa', 'am', 'mi', 'it', 'ts', 'sa', 'ax', 'xe', 'en', 'na']
2

请注意sample_list输出。

逻辑是我们要从主字符串创建长度等于子字符串长度的子字符串。

之所以这样做,是因为我们希望将这些子字符串与给定的子字符串进行匹配。

您可以更改代码中的字符串和子字符串的值,以尝试不同的组合,这也将帮助您学习代码的工作原理。

答案 7 :(得分:0)

def count_substring(string, sub_string):
    string = string.lower()
    sub_string = sub_string.lower()

    start = 0
    end = 0

    for index, letter in enumerate(string):
        if letter == sub_string[0]:
            temp_string = string[index:index+len(sub_string)]
            if temp_string == sub_string:
                return index
    return -1