使用python进行DNA测序

时间:2015-10-01 18:10:38

标签: python substring longest-substring

使用循环,如何在python中编写函数,对最长的蛋白质链进行排序,无论顺序如何。当关系与其他元素混合时,该函数返回一个只包含字符'A','C','G'和'T'的子字符串:例如,在序列中:'ACCGXXCXXGTTACTGGGCXTTGT',它返回'GTTACTGGGC “

1 个答案:

答案 0 :(得分:1)

如果数据以字符串形式提供,您可以简单地将其拆分为字符“X”,从而获得一个列表。

startstring = 'ACCGXXCXXGTTACTGGGCXTTGT'
array = startstring.split('X')

然后在检查元素长度的同时循环遍历列表会得到正确的结果:

# Initialize placeholders for comparison
temp_max_string = ''
temp_max_length = 0

#Loop over each string in the list
for i in array:
    # Check if the current substring is longer than the longest found so far
    if len(i) > temp_max_length:
        # Replace the placeholders if it is longer
        temp_max_length = len(i)
        temp_max_string = i

print(temp_max_string) # or 'print temp_max_string' if you are using python2.

您还可以使用python内置函数以更有效的方式获得结果:

按降序长度(list.sort()

排序
startstring = 'ACCGXXCXXGTTACTGGGCXTTGT'
array = startstring.split('X')
array.sort(key=len, reverse=True)
print(array[0]) #print the longest since we sorted for descending lengths
print(len(array[0])) # Would give you the length of the longest substring

仅获取最长的子字符串(max()):

startstring = 'ACCGXXCXXGTTACTGGGCXTTGT'
array = startstring.split('X')
longest = max(array, key=len)
print(longest) # gives the longest substring
print(len(longest)) # gives you the length of the longest substring