我有一个长度为n的字符串。例如:
s = "gcgcagagacgcaagcctaRgggSgggggttgggggggcgtgt"
我想要一个子串:
s1 = s[0:20]
结果:
s1 = "gcgcagagacgcaagcctaR"
然后检查它是否有任何不是" a"," c"," g"或" t"。这对于s1来说是正确的,因为它以" R"
结束下一步是更换" R"用" A"和" G" (或" S"用" C"和" G"),即创建两个新字符串:
"gcgcagagacgcaagcctaA"
"gcgcagagacgcaagcctaG"
然后采用s的新子串:
s1 = s[1:21]
重复此操作,直到我到达原始字符串的末尾。在这个例子中将是:
s1 = s[23:43]
如果子字符串中有两个特殊字符,则会生成4个新字符串。如果三则那么8等等。如果为零特殊字符,则按原样打印子字符串并向前移动。 比示例中有更多特殊字符,但重点仍然相同。
到目前为止我所拥有的:
def generate_substrings(sequence, start, end):
codes = set("MRWSYKVHDB")
s = sequence[start:end]
start += 1
end += 1
if end > len(sequence):
return
elif not any((nt in codes) for nt in s):
print(s)
else:
for i, nt in enumerate(s):
if nt not in "acgt":
if nt == "R":
s = s.replace("R", "A")
print(s)
return generate_substrings(sequence, start, end)
s = s.replace("R", "G")
return generate_substrings(sequence, start, end)
elif nt == "S":
s = s.replace("S", "C")
return generate_substrings(sequence, start, end)
s = s.replace("S", "G")
return generate_substrings(sequence, start, end)
generate_substrings("gcgcagagacgcaagcctaRgggSgggggttgggggggcgtgt", 0, 20)
我知道这个剧本并不能满足我的需要,但它是我现在所拥有的,如果有人可以帮助我扩展(或重写)它,我将非常感激。
答案 0 :(得分:2)
def generate_substrings(sequence):
length = len(sequence)
for i in range(len(sequence)- 19):
currentSequence = sequence[i:i+20]
recursiveReplaceLetter(currentSequence)
def recursiveReplaceLetter(s):
isOk = True
for i in range(len(s)):
if (s[i] == "R"):
isOk = False
newSequence1 = s
newSequence1 = newSequence1[:i+1].replace("R", "A") + newSequence1[i+1:]
recursiveReplaceLetter(newSequence1)
newSequence2 = s
newSequence2 = newSequence2[:i+1].replace("R", "G") + newSequence2[i+1:]
recursiveReplaceLetter(newSequence2)
break
elif(s[i] == "S"):
isOk = False
newSequence1 = s
newSequence1 = newSequence1[:i+1].replace("S", "C") + newSequence1[i+1:]
recursiveReplaceLetter(newSequence1)
newSequence2 = s
newSequence2 = newSequence2[:i+1].replace("S", "G") + newSequence2[i+1:]
recursiveReplaceLetter(newSequence2)
break
if (isOk):
print (s)
sequence="gRRcagagacgcaagcctaRgggSgggggttgggggggcgtgt"
generate_substrings(sequence)