我正在做一个生物信息学课程的项目。 对于该项目,我们收到多个DNA字符串和一个整数k。该项目的任务是找到一个K-mer基序,以使基序和每个DNA字符串之间的汉明距离之和最小。
我为此任务编写了一个函数MedianString(seqLines,k)。在该函数中,首先将初始值分配给几个变量,然后包括一个for循环。当我调用该函数时,python似乎跳过了循环之前的所有行,并直接运行循环中的内容。我试图在网上搜索问题的可能原因,并找到了一些类似的讨论,但是似乎没有一个适合我的情况。我很迷路...
python跳过的行:
print('BBBBBBBBBBBBBB')
distance = 100000000000000000000000000000000
print('distance=')
print(distance)
Median = []
pattern = []
sumHD = 0
运行的行:
for i in range (0, 4**k-1):
pattern = NumberToPattern(i,k)
print('this is pattern')
print (pattern)
sumHD = sumOfMinHD(pattern,seqLines)
print('sumHD=',type(sumHD))
print(sumHD)
print('distance=',type(distance))
print(distance)
if distance > sumHD:
print('distance>sumOfMinHD')
distance = sumOfMinHD(pattern,seqLines)
Median = pattern
print('distance=')
print(distance)
else:
print('distance is <=sumOfMinHD')
完整代码(伪代码:加利福尼亚大学圣地亚哥分校的DNA中的隐藏消息查找(生物信息学I)):
#Code Challenge: Implement MedianString.
#Input: An integer k, followed by a collection of strings Dna.
#Output: A k-mer Pattern that minimizes d(Pattern, Dna) among all possible choices of k-mers.
#(If there are multiple such strings Pattern, then you may return any one.)
#the concept of the code:
#MedianString(Dna, k)
# distance ← ∞
# for each k-mer Pattern from AA…AA to TT…TT
# if distance > d(Pattern, Dna)
# distance ← d(Pattern, Dna)
# Median ← Pattern
# return Median
with open(r'D:\Users\moonc\Desktop\python_exercises_for_bioimformatics_I\MedianStringSampleInput.txt','r') as seqFile :
DataSet = seqFile.read().splitlines()
print ('this is Dataset')
print (DataSet)
seqLines = DataSet [1:]
print ('this is seqLines')
print (seqLines)
print (len(seqLines))
k = int(DataSet[0])
print ('this is k')
print (k)
#NumberToPattern
def NumberToPattern(number, k):
pattern = []
for i in range (0,k):
if number // (4**(k-1-i)) == 0:
pattern.append ("A")
elif number // (4**(k-1-i)) == 1:
pattern.append ("C")
elif number // (4**(k-1-i)) == 2:
pattern.append ("G")
elif number // (4**(k-1-i)) == 3:
pattern.append ("T")
number = number % (4**(k-1-i))
intToString = map(str, pattern)
patternString = "".join(intToString)
return patternString
#Hamming Distance Problem: Compute the Hamming distance between two strings.
# Input: Two strings of equal length.
# Output: The Hamming distance between these strings.
def HammingDistance(p,q):
HD = 0
for i in range(0,len(p)):
if p[i] != q [i]:
HD = HD+1
return HD
#minHDandMotif(Pattern, Text) is the minimum Hamming distance between Pattern and any k-mer in Text
def minHDandMotif(Pattern,String):
HD = float('inf')
motif = []
for i in range (0,len(String)-len(Pattern)+1):
if HammingDistance(Pattern,String[i:i+len(Pattern)]) < HD:
HD = HammingDistance(Pattern,String[i:i+len(Pattern)])
motif = String[i:i+len(Pattern)]
# print (motif)
print ('this is minHD and Motif')
print ([HD,motif])
return [HD,motif]
#sumOfMinHD(Pattern, Dna) as the sum of distances between Pattern and all strings in Dna
#Dna is a collection of strings of the same length
def sumOfMinHD(Pattern,seqLines):
sumHD = 0
print(seqLines)
for i in range (0,len(seqLines)-1):
minHD = minHDandMotif(Pattern,seqLines[i])[0]
sumHD = sumHD + minHD
print ('this is sum Of MinHD')
print (sumHD)
return sumHD
######################
def MedianString(seqLines,k):
print('BBBBBBBBBBBBBB')
distance = 100000000000000000000000000000000
print('distance=')
print(distance)
Median = []
pattern = []
sumHD = 0
for i in range (0, 4**k-1):
pattern = NumberToPattern(i,k)
print('this is pattern')
print (pattern)
sumHD = sumOfMinHD(pattern,seqLines)
print('sumHD=',type(sumHD))
print(sumHD)
print('distance=',type(distance))
print(distance)
if distance > sumHD:
print('distance>sumOfMinHD')
distance = sumOfMinHD(pattern,seqLines)
Median = pattern
print('distance=')
print(distance)
else:
print('distance is <=sumOfMinHD')
return Median
####################
Median = MedianString(seqLines,k)
print ('this is MedianString')
print (Median)
MedianString(seqLines,k)中循环最后一次递归的控制台:
this is pattern
TTG
['AAATTGACGCAT', 'GACGACCACGTT', 'CGTCAGCGCCTG', 'GCTGAGCACCGG', 'AGTACGGGACAG']
this is minHD and Motif
[0, 'TTG']
this is minHD and Motif
[2, 'ACG']
this is minHD and Motif
[1, 'CTG']
this is minHD and Motif
[1, 'CTG']
this is sum Of MinHD
4
sumHD= <class 'int'>
4
distance= <class 'int'>
2
distance is <=sumOfMinHD
this is MedianString
ACC