Question

我正在尝试搜索精确模式匹配到退化或非常轻微＆＃34;模糊＆＃34;串。

pattern = 'VGSGSGSGSAS' can be 10-50 characters long
string = "VGSGSGGSGSGSGSGSERGSAS" or "VGSGSGSGSGSGAGERSAS"  #it's actually 400 character long string 
string[11] = S|A
re.search(pattern, string) #does not work

所以string [11]是S或A.在这个例子中，我在2个定义的字符串中搜索模式，但是，我不想创建2个单独的字符串，因为有实际上400字符串中的多个（至少4个）位置，每个字符最多有3个不同的选项。所以我会考虑制作和搜索24种不同的字符串。而这只是一个序列。我的一些序列会变成64个不同的字符串。一旦找到搜索结果，我想弄清楚它的开始和结束位置，以及它实际匹配的字符串[11]字符（S或A）。我有什么想法可以做这种模式匹配？谢谢！

Answer 1

你不需要正则表达式，简单的str.find()足以找到它：

pattern = "KVTMQNL"  # substring to find
your_strings = ["VGSEKVTMQNLNDRLAS", "VGSEKVTMQKLNDRLAS"]  # just a list of strings
for source in your_strings:  # go through each string in our list
    print("String: {}\nEleventh character: {}".format(source, source[11]))
    if source[11] in ("N", "K"):  # check if it has `N` or `K` on its 11th position
        index = source.find(pattern)  # search for the pattern
        if index > -1:  # if the pattern is found anywhere in the string...
            print("Pattern start: {}\nPattern end: {}".format(index, index + len(pattern)))
        else:
            print("No match!")
    print("---")

将打印出来：

String: VGSEKVTMQNLNDRLAS
Eleventh character: N
Pattern start: 4
Pattern end: 11
---
String: VGSEKVTMQKLNDRLAS
Eleventh character: N
No match!
---

不确定这是否正是您正在寻找的，所以，您的问题有点令人困惑。

Answer 2

实际上，如果我从你的问题中理解的是正确的，你可以使用re.search()一个字符串列表，如下例所示：

import re

strings = ["VGSEKVTMQNLNDRLAS", "GSEKVTMQNLNDRLAS", "DRLASKVTMQKL", "GSEKVKVTMQPLDRLAS"]
pattern = r'KVTMQ[N|K]L'

for k in strings:
    s = re.search(pattern, k)
    print("Search in: {}  ".format(k), end = ' ')
    if s:
        print("Found: {} in position: {}".format(s.group(), s.span()))
    else:
        print("Not found")

输出：

Search in: VGSEKVTMQNLNDRLAS   Found: KVTMQNL in position: (4, 11)
Search in: GSEKVTMQNLNDRLAS   Found: KVTMQNL in position: (3, 10)
Search in: DRLASKVTMQKL   Found: KVTMQKL in position: (5, 12)
Search in: GSEKVKVTMQPLDRLAS   Not found

在python中搜索退化字符串

2 个答案: