确定序列是否是有效的DNA序列

时间:2015-02-01 23:25:49

标签: python if-statement python-3.x for-loop dna-sequence

我正在尝试编写这个程序,将序列读入一个名为sequence的字符串变量,并查明sequence是否包含有效的DNA序列。 我想使用单个for和一个if-elif-else语句来确定序列是否为有效DNA。
这是我到目前为止所写的:

sequence = input("Please enter a sequence: ").upper()
valid_dna = "ACGT"
sequence = sequence.replace(" ", "")

common=0
for eachletter in sequence:
    if eachletter in valid_dna:
        common +=1

print("This is a valid dna sequence")

elif sequence != valid_dna:
    print("This is not a valid DNA sequence")

else:
    print()

我不知道在elif之后要添加什么,因为我在elif之后添加的内容会返回Syntax error

我最初有

sequence = input().upper()
sequence= input("Please enter a sequence:  ")

哪个不能很好地协同工作,感谢VHarisop指出它!

更新: 这就是我现在拥有的,它的确有效!

sequence = input().upper()
valid_dna = "ACGT"
sequence = sequence.replace(" ", "")

for i in sequence:
    if i in valid_dna:
            count = 1
    else:
            count=0
if count==1:
    print("This is a valid DNA sequence.") 
else:
    print("This is an invalid DNA sequence")

3 个答案:

答案 0 :(得分:4)

我只想使用all和生成器表达式

>>> valid = 'ACTG'

>>> s1 = 'ATAGCGGCAT'
>>> all(i in valid for i in s1)
True

>>> s2 = 'ABCDEFHI'
>>> all(i in valid for i in s2)
False

如果您必须使用for循环和if语句,因为这是作业要求,您可以使用类似的想法

def validSequence(s):
    valid = 'ACTG'
    for letter in s:
        if letter not in valid:
            return False
    return True

>>> validSequence('ATAGCGGCAT')
True
>>> validSequence('ABCDEFHIJK')
False

答案 1 :(得分:1)

首先,你有:

sequence = input().upper()
# irrelevant code
sequence= input("Please enter a sequence:  ")

这将要求输入两次,将您键入的所有内容第一次转换为大写,并使第二次保持不变,这显然会导致错误行为。我建议只保留:

sequence = input('Please enter a sequence: ').upper()

然后使用生成器表达式来检查有效性。

实际上,没有必要为无效字符保留单独的字符串。只是做:

valid_dna = 'ACGT'
sequence = input('Please enter a sequence: ').upper()

# will print True if every character in the sequence belongs to valid_dna
print(all(i in valid_dna for i in sequence))

这里,生成器表达式(i in valid_dna for i in sequence)将为属于valid_dna的序列的每个字符返回True,对于没有的每个字符返回False。仅当表达式生成的每个值都为True时,内置函数any()才会返回True。

如果你想要一个正确的信息,你可以简单地检查表达式的返回值并相应地打印:

condition = all(i in valid_dna for i in sequence)
print('Valid sequence') if condition else print('Invalid sequence')

答案 2 :(得分:-1)

def is_valid_sequence(dna):
    char_invalid = ''
    for char in dna:
        if char not in 'ATCG':
            char_invalid = char_invalid + char
    return not bool (char_invalid)