我会不时删除或替换一个长字符串的子字符串。因此,我将确定一个开始模式和一个结束模式,它将确定子字符串的起点和终点:
long_string = "lorem ipsum..white chevy..blah,blah...lot of text..beer bottle....and so to the end"
removed_substr_start = "white chevy"
removed_substr_end = "beer bott"
# this is pseudo method down
STRresult = long_string.replace( [from]removed_substr_start [to]removed_substr_end, "")
答案 0 :(得分:3)
您可以使用regex
:
>>> import re
>>> strs = "lorem ipsum..white chevy..blah,blah...lot of text..beer bottle....and so to the end"
>>> sub_start = "white chevy"
>>> sub_end = "beer bott"
>>> re.sub(r'{}.*?{}'.format(re.escape(sub_start),re.escape(sub_end)),'',strs)
'lorem ipsum..le....and so to the end'
如果您只想删除"white chevy"
和"beer bott"
之间的子字符串,而不是这些字词:
>>> re.sub(r'({})(.*?)({})'.format(re.escape(sub_start),
re.escape(sub_end)),r'\1\3',strs)
'lorem ipsum..white chevybeer bottle....and so to the end'
答案 1 :(得分:2)
我猜你想要这样的东西,没有正则表达式:
def replace_between(text, begin, end, alternative=''):
middle = text.split(begin, 1)[1].split(end, 1)[0]
return text.replace(middle, alternative)
未经过测试,您应该保护第一行免于异常(如果找不到开头或结尾),但想法是在这里:)
答案 2 :(得分:2)
使用string.find()
获取起始索引,使用string.rfind()
获取最后一个索引,然后使用以下命令删除内部部分:
lindex = string.find(long_string, removed_substr_start)
rindex = string.find(long_string, removed_substr_end, lindex)
result = long_string[0:lindex] + longstring[rindex:]
请参阅:http://docs.python.org/2/library/string.html#string.find
答案 3 :(得分:1)
import re
regexp = "white chevy.*?beer bott"
long_string = "lorem ipsum..white chevy..blah,blah...lot of text..beer bottle....and so to the end"
re.sub(regexp, "", long_string)
给出:
'lorem ipsum..le....and so to the end'
答案 4 :(得分:1)
在使用多种方法后,我发现这种解决方案最好没有正则表达式:
def getString( str, _from, _to ):
end_from = str.find( _from ) +len( _from)
return str[ end_from : str.find( _to, end_from ) ]