Question

我正在自学基础编程一个简单的项目是查找字符串中子字符串的重复索引。所以例如，在字符串“abcdefdef”和子串“def”中，我希望输出为3和6.我有一些代码写，但我没有得到我想要的答案。以下是我写的

注意：我知道可能有更简单的方法来生成结果，利用该语言的内置功能/包，例如正则表达式。我也知道我的方法可能不是最佳算法。从来没有，在这个时候，我只是在寻求修复以下逻辑的建议，而不是使用更多惯用的方法。

import string

def MIT(String, substring): # "String" is the main string I'm searching within
    String_list = list(String)
    substring_list = list(substring)
    i = 0
    j = 0
    counter = 0
    results = []
    while i < (len(String)-1):
        if [j] == [i]:
            j = j + 1
            i = i + 1
            counter  = counter + 1
            if counter == len(substring):
                results.append([i - len(substring)+1])
                counter = 0
                j = 0
                i = i+1
        else:
            counter = 0
            j = 0
            i = i+1
    print results
    return

我的推理理由就是这样。我将String和子字符串转换为列表。这允许索引字符串中的每个字母。我设置i和j = 0 - 这些将分别是我在String和substring索引中的第一个值。我还有一个新的变量counter，我将其设置为0.基本上，我使用计数器来计算位置[i]中的字母等于位置[j]中的元素的次数。如果counter等于子串的长度，那么我知道[i - len（substring）+ 1]是我的子串开始的位置，所以我将它添加到名为results的列表中。然后我重置计数器和j并继续搜索更多的子串。

我知道代码很尴尬，但我认为我仍然可以得到答案。相反，我得到：

>>> MIT("abcdefghi", "def")
[[3]]
>>> MIT("abcdefghi", "efg")
[[3]]
>>> MIT("abcdefghi", "b")
[[1]]
>>> MIT("abcdefghi", "k")
[[1]]

有什么想法吗？

Answer 1

正则表达式模块（re）更适合此任务。

很好的参考： http://docs.python.org/howto/regex.html

此外： http://docs.python.org/library/re.html

编辑：更“手动”的方式可能是使用切片

s = len(String)
l = len(substring)
for i in range(s-l+1):
    if String[i:i+l] == substring:
        pass #add to results or whatever

Answer 2

主要/主要问题如下：

进行比较，请使用：if String[i] == substring[j]
如果找到匹配项，则增加i两次，删除第二个增量。
循环应该到while i < len(String):

当然它不会找到重叠的匹配（例如：MIT("aaa", "aa")）

有一些小的“问题”，它不是真正的pythonic，不需要构建列表，如果写i += 1，增量更清晰，有用的函数应返回值不打印它们等等...

如果您需要正确且快速的代码，请查看经典算法手册：http://www.amazon.com/Introduction-Algorithms-Thomas-H-Cormen/dp/0262033844。它有一整章关于字符串搜索。

如果你想要一个没有实现整个事情的pythonic解决方案，请检查其他答案。

Answer 3

首先，我在您的代码中添加了一些注释以提供一些提示

import string

def MIT(String, substring): 
    String_list = list(String)  # this doesn't need to be done; you can index strings
    substring_list = list(substring)
    i = 0
    j = 0
    counter = 0
    results = []
    while i < (len(String)-1):   
        if [j] == [i]:   # here you're comparing two, one-item lists. you must do substring[j] and substring[i]
            j = j + 1
            i = i + 1
            counter  = counter + 1
            if counter == len(substring):
                results.append([i - len(substring)+1]) # remove the brackets; append doesn't require them
                counter = 0
                j = 0
                i = i+1 # remove this 
        else:
            counter = 0
            j = 0
            i = i+1
print results
return

如果不使用内置库等，我就会这样做：

def MIT(fullstring, substring):
    results = []
    sub_len = len(substring)
    for i in range(len(fullstring)):  # range returns a list of values from 0 to (len(fullstring) - 1)
        if fullstring[i:i+sub_len] == substring: # this is slice notation; it means take characters i up to (but not including) i + the length of th substring
            results.append(i)
    return results

Answer 4

我不清楚你是否想学习一些好的字符串搜索算法，或者用Python直接的方法。如果是后者，那么string.find就是你的朋友。像

这样的东西

def find_all_indexes(needle, haystack):
    """Find the index for the beginning of each occurrence of ``needle`` in ``haystack``. Overlaps are allowed."""
    indexes = []
    last_index = haystack.find(needle)
    while -1 != last_index:
        indexes.append(last_index)
        last_index = haystack.find(needle, last_index + 1)
    return indexes


if __name__ == '__main__':
    print find_all_indexes('is', 'This is my string.')

虽然这是一种非常天真的方法，但应该很容易理解。

如果您正在寻找使用更少标准库的东西（并且实际上会教您实现库时使用的相当常见的算法），您可以尝试实现Boyer-Moore string search algorithm。

Answer 5

为了在字符串中查找子字符串的位置，该算法将执行：

def posnof_substring(string,sub_string):
l=len(sub_string)
for i in range(len(string)-len(sub_string)+1):
    if(string[i:i+len(sub_string)] == sub_string ):      
        posn=i+1
return posn

我自己检查了这个算法并且它有效！

Answer 6

基于@Hank Gay 的回答。使用正则表达式并添加一个选项来搜索单词。

NOTE: The Amazon Linux images selected will be cached in your cdk.json, so that your AutoScalingGroups don't automatically change out from under you when you're making unrelated changes. To update to the latest version of Amazon Linux, remove the cache entry from the context section of your cdk.json.

字符串中子字符串的基本索引重复（python）

6 个答案: