我正在尝试使用python脚本为多种文件类型的DNA序列生成反向互补。这是我到目前为止所写的内容
import gzip
import re
############## Reverse Complement Function #################################
def rev_comp(dna):
dna_upper = dna.upper() #Ensures all input is capitalized
dna_rev = dna_upper[::-1] #Reverses the string
conversion = {'A':'T','C':'G','G':'C','T':'A','Y':'R','R':'Y',\
'S':'S','W':'W','K':'M','M':'K','B':'V','V':'B',\
'D':'H','H':'D','N':'N','-':'-'}
rev_comp = ''
rc = open("Rev_Comp.fasta", 'w')
for i in dna_rev:
rev_comp += conversion[i]
rc.write(str(rev_comp))
print("Reverse complement file Rev_Comp.fasta written to directory")
x = input("Enter filename (with extension) of the DNA sequence: ")
if x.endswith(".gz"): #Condition for gzip files
with gzip.open(x, 'rb') as f:
file_content = f.read()
new_file = open("unzipped.fasta", 'w')
new_file.write(str(file_content))
print("unzipped.fasta written to directory")
xread = x.readlines()
fast = ''
if x.endswith(".fasta"): #condition for fasta files
for i in xread:
if not i.startswith('>'):
fast = fast + i.strip('\n')
if x.endswith(".fastq"): #condition for fastq files
for i in range(1,len(xread),4):
fast = fast + xread[i].strip('\n')
rev_comp(x)
最后我想到的是
AttributeError: 'str' object has no attribute 'readlines'
当我尝试使用.fastq文件运行脚本时。这到底是怎么了?我希望脚本编写Rev_comp.fasta,但事实并非如此。
答案 0 :(得分:2)
x
不是文件句柄,而只是文件名。你需要做
with open(x) as xhandle:
xread = xhandle.readlines()
如果不将所有行读入内存,总体逻辑可能会更好。同样,.gz
案件最终出现在不确定的范围内;您是否需要在x
处理结束时将gz
设置为解压缩数据的名称,还是将其后的代码放入else:
分支中?
答案 1 :(得分:1)
x是来自用户的输入,它是一个字符串。您需要open
的文件才能在其上调用readlines
。
根据您现有的代码:
x = input("Enter filename (with extension) of the DNA sequence: ") # x stores a string
file_x = open(x, 'r') # You must open a file ...
xread = file_x.readlines() # and call readlines on the file instance.
# Although it is not explicitly necessary, remember to close the file when you'done, is good practice.
file_x.close()
with open(x) as file_x:
xread = file_x.readlines()