DNA序列操作

时间:2016-04-25 18:27:19

标签: python sequences

所以我对编程非常陌生,而且我对任何编程语言都不是很了不起。我为生物学家购买了一本关于编程的书,我在一些事情上摸索着。我想:从文件中获取序列,并从中查找并提取变量区域。我的代码如下:

**

#!/usr/bin/python
#for extracting GAA sequences
import os
import sys
import re
#opens sequence file and defines it as reps
reps = open('142sequences.txt')
#defining what to read
line = reps.readlines()
#defines what we are looking for in rep lines
for line in reps:
    sear = re.search(r"C[A]{2,}G[ATCG]{17, 2700}AAT[A]{2,4}G[A]{2,}", reps)
    if sear:
        repeats = sear.group()
        print(repeats)
    else:
        print('Not Recognized')

** 我得不到任何回报。请帮忙

1 个答案:

答案 0 :(得分:1)

你需要搜索每一行而不是所有行的列表:

with open('142sequences.txt') as reps:
    # iterate over each line in the file
    for line in reps:
        # pass each line to re.search
        sear = re.search(r"C[A]{2,}G[ATCG]{17, 2700}AAT[A]{2,4}G[A]{2,}", line)
        if sear:
            repeats = sear.group()
            print(repeats)
        else:
            print('Not Recognized')

调用 readlines 将所有行读入列表中,这样你实际上就不会循环使用自己的代码,因为你已经使用初始readlines调用消耗了迭代器,如果你已经循环它会导致一个错误,因为你必须传递一个字符串而不是一个列表来搜索。