Question

我需要一些特定问题的帮助，我在这个网站上似乎找不到。我的结果看起来像这样：

result = "ooooooooooooooooooooooMMMMMMooooooooooooooooooMMMMMMooooooooooMMMMMMMMoo"

这是一种跨膜预测。所以对于这个字符串，我有另一个长度相同的字符串，但是是一个氨基酸代码，例如：

amino_acid_code = "MSDENKSTPIVKASDITDKLKEDILTISKDALDKNTWHVIVGKNFGSYVTHEKGHFVYFYIGPLAFLVFKTA"

我想对最后一个“M”区域做一些研究。这可以在长度上变化，以及之后的“o”。所以在这种情况下，我需要从最后一个字符串中提取“PLAFLVFK”，该字符串对应于最后一个“M”区域。

我已经有类似的东西，但我无法弄清楚如何获得起始位置，我也相信一个更简单（或计算更好）的解决方案是可能的。

end = result.rfind('M')
start = ?
region_I_need = amino_acid_code[start:end]

提前致谢

Answer 1

您可以使用re.finditer()的最后一场匹配找到M区域最后一次出现的位置，如下所示：

import re

result = "ooooooooooooooooooooooMMMMMMooooooooooooooooooMMMMMMooooooooooMMMMMMMMoo"
amino_acid_code = "MSDENKSTPIVKASDITDKLKEDILTISKDALDKNTWHVIVGKNFGSYVTHEKGHFVYFYIGPLAFLVFKTA"

#find last occurence of M region
try:
    last_match = [match for match in re.finditer("M+", result)][-1]
except IndexError:
    last_match = None

#print corresponding amino acid region
if last_match:
    print(amino_acid_code[last_match.start():last_match.end()])

更好性能的替代方案是反转字符串：

last_match = re.search("M+", result[::-1])
if last_match:
    print(amino_acid_code[len(result) - last_match.end():len(result) - last_match.start()])

如何在字符串中查找模式最后一次出现的位置，并使用这些位置从另一个字符串中提取子字符串

1 个答案: