Question

我有一个字符串：

b = 'Can you can a can as a canner can can a can?'

无论大小写如何，我都需要找到字符串b中子字符串“ can”的所有可能的开始和结束位置。我可以使用正则表达式来做到这一点，但是对于同一操作，我需要最少的代码而无需使用正则表达式（或不导入re）。这是我使用正则表达式的代码：

import re
b_find = [(i.start() , i.end()) for i in re.finditer(r"can",b.lower())]

我想要一个不使用正则表达式（可能使用列表推导）的解决方案。有什么办法吗？

Answer 1

是的，但是它既不是超级优雅，也不是非常有效。.但是，它在这里：

b_find = [(i, i+3) for i in range(len(b)-2) if b[i:i+3].lower() == 'can']

，它产生的结果与基于regex的代码相同。那就是：

[(0, 3), (8, 11), (14, 17), (23, 26), (30, 33), (34, 37), (40, 43)]

Answer 2

编写为函数，应达到您的目的：

>>> def split_indices(s, sep):
...     current = 0
...     sep_len = len(sep)
...     sections = s.lower().split(sep)
...     for section in sections[:-1]:  # skip trailing entry
...         current += len(section)
...         yield (current, current+sep_len)
...         current += sep_len

该函数是一个生成器，因此，如果要以列表的形式获取结果，则必须重写函数以返回列表，或者将结果解压缩为列表：

>>> b = 'Can you can a can as a canner can can a can?'
>>> [*split_indices(b, 'can')]
[(0, 3), (8, 11), (14, 17), (23, 26), (30, 33), (34, 37), (40, 43)]

Answer 3

一个更简单的变化是：

block = 'Can you can a can as a canner can can a can?'.lower()
index = -1
indexes = []
try:
  while True:
    index = block.index('can', index + 1)
    indexes.append(index)
except ValueError:
  pass

Answer 4

这是一个超简单的线性有限自动机。如果您使用“ cacan”之类的词，它将变得更加复杂，但是对于“ can”而言，这确实很容易：

def nextCan( str, state ):
    for i in range(len(str)):
        ch = str[i]
        if 0 == state:
            if ch == 'c':
                state = 1
            else:
                state = 0
        elif 1 == state:
            if ch == 'a':
                state = 2
            else:
                state = 0
        elif 2 == state:
            if ch = 'n':
                yield (i-2,i+1)
            state = 0

b_find = [ x for x in nextCan( b, 0 ) ]

Python：在不使用正则表达式的情况下查找字符串中所有子字符串的情况

4 个答案: