所以我试图创建一个正则表达式子句,它可以检测' ACTG' 字符的任意组合,并接受它为有效。其他任何内容 - 包括' ACTG' 和其他一些字符的组合无效。
最终,我将其从while循环中取出,仅用于测试目的。现在我相信只要它以a,c,t或g开头,就说它有效......
正则表达式中的函数是否比匹配更适合?
import re
while (True):
DnaString = str(input('enter your polynucleotide chain code hooblah'))
if (re.match('([ACTG]+[ACTG]*)', DnaString, flags=0)):
#if re.search('^ACTG', DnaString) != -1:
print ("valid chain.")
else: #(re.search('^[ACTG]+[ACTG]*$', DnaString) == -1):
print("invalid chain, please check your input.")
if (DnaString.find("end") != -1):
print("ohokaybye.")
break
答案 0 :(得分:2)
为什么不
if all(c in 'ACGT' for c in DnaString):
# Do success
else:
# Do failure
答案 1 :(得分:2)
您的问题是您只是在字符串中的任何位置搜索ACTG字符,而没有指定其他任何内容都不允许。
如果您将正则表达式更改为^[ACTG]+$
,那么它将按预期工作。 ^和$字符是锚点,分别表示行的开头和结尾。
所以上面的正则表达式匹配一个字符串,该字符串包含四个字符中的一个或多个,并且不允许在它们之前或之后的任何其他字符。
答案 2 :(得分:1)
如果您允许匹配在内部重复可接受的字符,那么这可能是您想要的:
'[A|C|T|G]{4}'
答案 3 :(得分:0)
in
True
或False
值的生成器all
all
True
,则返回True
,否则返回False
。(base in bases for base in sequence)
bases = 'acgt'
sequence = (input('Input DNA sequence: ')).lower()
if all(base in bases for base in sequence):
print('Input is correct')
else:
print('Only allowed bases are A, T, C, G')
Input DNA sequence: atcgggggcccccttttaaaa
Input is correct
Input DNA sequence: atcgggggcccccttttaaaaf
Only allowed characters are A, T, C, G
def check_sequence(sequence: str):
sequence = sequence.lower()
bases = 'acgt'
if all(base in bases for base in sequence):
print('Input is correct')
else:
print('Only allowed characters are A, T, C, G')
my_sequence = 'gcaatgcAttGtgaaagagccGcTaCaacctaaacGctgcacgtcacctagagtgtCttgcgggTgaggccctctcgAacagattacagtaccgttatc'
check_sequence(my_sequence)
>>> Input is correct
zip
组合可迭代项def check_sequence(sequence: str) -> list:
sequence = [base for base in sequence.lower()]
base_pairs = 'acgt'
matches = list(bases in base_pairs for bases in sequence)
sequence_check = list(zip(sequence, matches))
if all(matches):
print('Input is correct')
else:
print('Only allowed characters are A, T, C, G')
return sequence_check
my_sequence = 'GcaatGcatfftgtgaaagAg'
verified_sequence = check_sequence(my_sequence)
print(verified_sequence)
# Output:
[('g', True),
('c', True),
('a', True),
('a', True),
('t', True),
('g', True),
('c', True),
('a', True),
('t', True),
('f', False),
('f', False),
('t', True),
('g', True),
('t', True),
('g', True),
('a', True),
('a', True),
('a', True),
('g', True),
('a', True),
('g', True)]