Question

我会不时删除或替换一个长字符串的子字符串。因此，我将确定一个开始模式和一个结束模式，它将确定子字符串的起点和终点：

long_string = "lorem ipsum..white chevy..blah,blah...lot of text..beer bottle....and so to the end"
removed_substr_start = "white chevy"
removed_substr_end = "beer bott"

# this is pseudo method down
STRresult = long_string.replace( [from]removed_substr_start [to]removed_substr_end, "")

Answer 1

您可以使用regex：

>>> import re
>>> strs = "lorem ipsum..white chevy..blah,blah...lot of text..beer bottle....and so to the end"
>>> sub_start = "white chevy"
>>> sub_end = "beer bott"
>>> re.sub(r'{}.*?{}'.format(re.escape(sub_start),re.escape(sub_end)),'',strs)
'lorem ipsum..le....and so to the end'

如果您只想删除"white chevy"和"beer bott"之间的子字符串，而不是这些字词：

>>> re.sub(r'({})(.*?)({})'.format(re.escape(sub_start),
                                               re.escape(sub_end)),r'\1\3',strs)
'lorem ipsum..white chevybeer bottle....and so to the end'

Answer 2

我猜你想要这样的东西，没有正则表达式：

def replace_between(text, begin, end, alternative=''):
    middle = text.split(begin, 1)[1].split(end, 1)[0]
    return text.replace(middle, alternative)

未经过测试，您应该保护第一行免于异常（如果找不到开头或结尾），但想法是在这里：）

Answer 3

使用string.find()获取起始索引，使用string.rfind()获取最后一个索引，然后使用以下命令删除内部部分：

lindex = string.find(long_string, removed_substr_start)
rindex = string.find(long_string, removed_substr_end, lindex)
result = long_string[0:lindex] + longstring[rindex:]

请参阅：http://docs.python.org/2/library/string.html#string.find

Answer 4

import re
regexp = "white chevy.*?beer bott"
long_string = "lorem ipsum..white chevy..blah,blah...lot of text..beer bottle....and so to the end"
re.sub(regexp, "", long_string)

给出：

'lorem ipsum..le....and so to the end'

Answer 5

在使用多种方法后，我发现这种解决方案最好没有正则表达式：

def getString( str, _from, _to ):
    end_from = str.find( _from ) +len( _from)
    return str[ end_from : str.find( _to, end_from ) ]

如何删除或替换由起点和终点确定的Python中的子字符串？

5 个答案: