我想剪掉匹配的字符串。
我考虑在re.finditer中使用" [m.start()for m('')]"获得索引。
但我认为存在比这更好的方式。
例如,我想在"标题"之间剪切字符串。和"页脚"。
str = "header1svdijfooter1ccsdheader2cdijhfooter2"
headers = ["one": "header1", "two": "header2"]
footers = ["one": "footer1", "two": "footer2"]
#I want to get ["header1svdijfooter1", "header2cdijhfooter2"]
请建议我。
答案 0 :(得分:1)
import re
def returnmatches(text,headers,footers):
"""headers is a list of headers
footers is a list of footers
text is the text to search"""
for header,footer in zip(headers,footers):
pattern = r"{}\w+?{}".format(header,footer)
try:
yield re.search(pattern,input_text).group()
except AttributeError:
# handle no match
pass
或者:
text = "header1svdijfooter1ccsdheader2cdijhfooter2"
headers = ["header1", "header2"]
footers = ["footer1", "footer2"]
import re
matches = [re.search(r"{}\w+?{}".format(header,footer),text).group() for header,footer in zip(headers,footers) if re.search(r"{}\w+?{}".format(header,footer),text)]
答案 1 :(得分:1)
import re
# as a general rule you shouldn't call variables str in python as it's a builtin function name.
str = "header1svdijfooter1ccsdheader2cdijhfooter2"
# this is how you declare dicts.. but if you're only going to have "one"
# and "two" for the keys why not use a list? (you need the {} for dicts).
#headers = {"one": "header1", "two": "header2"}
#footers = {"one": "footer1", "two": "footer2"}
delimiters = [("header1", "footer1"), ("header2", "footer2")]
results = []
for header, footer in delimiters:
regex = re.compile("({header}.*?{footer})".format(header = header, footer = footer))
matches = regex.search(str)
if matches is not None:
for group in matches.groups():
results.append(group)
print results
答案 2 :(得分:0)
可以使用列表推导在一行中完成计算:
s = "header1svdijfooter1ccsdheader2cdijhfooter2"
headers = {"one": "header1", "two": "header2"}
footers = {"one": "footer1", "two": "footer2"}
out = [re.search('({}.*?{})'.format(headers[k], footers[k]), s).group(0) for k in sorted(headers.keys())]
以上假设,根据示例,只有一个匹配组。
或者,如果有人喜欢循环:
s = "header1svdijfooter1ccsdheader2cdijhfooter2"
headers = {"one": "header1", "two": "header2"}
footers = {"one": "footer1", "two": "footer2"}
out=[]
for k in sorted(headers.keys()):
out.extend(re.search('({}.*?{})'.format(headers[k], footers[k]), s).groups())
print out
以上产生输出:
['header1svdijfooter1', 'header2cdijhfooter2']
答案 3 :(得分:0)
没有重新:
str = "header1svdijfooter1ccsdheader2cdijhfooter2"
result = []
capture=False
currentCapture = ""
for i in range(len(str)):
if str[i:].startswith("header1") or str[i:].startswith("header2"):
currentCapture = ""
capture=True
elif str[:i].endswith("footer1") or str[:i].endswith("footer2"):
capture=False
result.append(currentCapture)
currentCapture = ""
if capture:
currentCapture = currentCapture+str[i]
if currentCapture:
result.append(currentCapture)
print result
输出:
['header1svdijfooter1', 'header2cdijhfooter2']