我有像
这样的DNA序列seq='ATCGTTTTTCGAAACTGCCCCCCACTGGGGA'
我想在python中打印连续的重复核苷酸(如果它连续重复两次以上)。
对于此序列,输出应为
TTTTT
AAA
CCCCCC
GGGG
答案 0 :(得分:3)
您可能需要查看itertools.groupby
。
示例用法:
for _, group in itertools.groupby(seq):
group = ''.join(group)
if len(group) > 2:
print group
答案 1 :(得分:1)
您可以使用后引用regular expression
和findall
方法轻松找到重复内容;
seq = 'ATCGTTTTTCGAAACTGCCCCCCACTGGGGA'
import re
hits = re.findall(r'(([A-Z])\2\2+)', seq) # regex matching all repeating A-Z groups
print [hit[0] for hit in hits] # Comprehension to filter the results
['TTTTT', 'AAA', 'CCCCCC', 'GGGG']
答案 2 :(得分:0)
seq='ATCGTTTTTCGAAACTGCCCCCCACTGGGGA'
while len(seq) > 1:
value = seq[0]
repeats = 1
idx = 1
while 1:
if seq[idx] == value:
repeats += 1
else:
if repeats > 1: print value*repeats
seq = seq[repeats:]
break
idx += 1