我正在尝试搜索精确模式匹配到退化或非常轻微"模糊"串。
pattern = 'VGSGSGSGSAS' can be 10-50 characters long
string = "VGSGSGGSGSGSGSGSERGSAS" or "VGSGSGSGSGSGAGERSAS" #it's actually 400 character long string
string[11] = S|A
re.search(pattern, string) #does not work
所以string [11]是S或A.在这个例子中,我在2个定义的字符串中搜索模式,但是,我不想创建2个单独的字符串,因为有实际上400字符串中的多个(至少4个)位置,每个字符最多有3个不同的选项。所以我会考虑制作和搜索24种不同的字符串。而这只是一个序列。我的一些序列会变成64个不同的字符串。一旦找到搜索结果,我想弄清楚它的开始和结束位置,以及它实际匹配的字符串[11]字符(S或A)。我有什么想法可以做这种模式匹配?谢谢!
答案 0 :(得分:0)
你不需要正则表达式,简单的str.find()
足以找到它:
pattern = "KVTMQNL" # substring to find
your_strings = ["VGSEKVTMQNLNDRLAS", "VGSEKVTMQKLNDRLAS"] # just a list of strings
for source in your_strings: # go through each string in our list
print("String: {}\nEleventh character: {}".format(source, source[11]))
if source[11] in ("N", "K"): # check if it has `N` or `K` on its 11th position
index = source.find(pattern) # search for the pattern
if index > -1: # if the pattern is found anywhere in the string...
print("Pattern start: {}\nPattern end: {}".format(index, index + len(pattern)))
else:
print("No match!")
print("---")
将打印出来:
String: VGSEKVTMQNLNDRLAS
Eleventh character: N
Pattern start: 4
Pattern end: 11
---
String: VGSEKVTMQKLNDRLAS
Eleventh character: N
No match!
---
不确定这是否正是您正在寻找的,所以,您的问题有点令人困惑。
答案 1 :(得分:0)
实际上,如果我从你的问题中理解的是正确的,你可以使用re.search()
一个字符串列表,如下例所示:
import re
strings = ["VGSEKVTMQNLNDRLAS", "GSEKVTMQNLNDRLAS", "DRLASKVTMQKL", "GSEKVKVTMQPLDRLAS"]
pattern = r'KVTMQ[N|K]L'
for k in strings:
s = re.search(pattern, k)
print("Search in: {} ".format(k), end = ' ')
if s:
print("Found: {} in position: {}".format(s.group(), s.span()))
else:
print("Not found")
输出:
Search in: VGSEKVTMQNLNDRLAS Found: KVTMQNL in position: (4, 11)
Search in: GSEKVTMQNLNDRLAS Found: KVTMQNL in position: (3, 10)
Search in: DRLASKVTMQKL Found: KVTMQKL in position: (5, 12)
Search in: GSEKVKVTMQPLDRLAS Not found