我正在尝试使用CSV将一些数据写入Excel电子表格。 我正在写一个主题查找器,从fasta读取输入并输出到excel。 但是我很难以正确的格式编写数据。
我在excel中的期望结果如下所示..
SeqName M1 Hits M2 Hits
Seq1 MN[A-Z] 3 V[A-Z]R[ML] 2
Seq2 MN[A-Z] 0 V[A-Z]R[ML] 5
Seq3 MN[A-Z] 1 V[A-Z]R[ML] 0
我已经生成了正确的结果,但我只是不知道如何将它们放在正确的格式中。
这是我到目前为止的代码。
import re
from Bio import SeqIO
import csv
import collections
def SearchMotif(f1, motif, f2="motifs.xls"):
with open(f1, 'r') as fin, open(f2,'wb') as fout:
# This makes SeqName static and everything else mutable thus, when more than 1 motifs are searched,
# they can be correctly placed into excel.
writer = csv.writer(fout, delimiter = '\t')
motif_fieldnames = ['SeqName']
writer_dict = csv.DictWriter(fout,delimiter = '\t' ,fieldnames=motif_fieldnames)
for i in range(0,len(motif),1):
motif_fieldnames.append('M%d' %(i+1))
motif_fieldnames.append('Hits')
writer_dict.writeheader()
# Reading input fasta file for processing.
fasta_name = []
for seq_record in SeqIO.parse(f1,'fasta'):
sequence = repr(seq_record.seq) # re-module only takes string
fasta_name.append(seq_record.name)
print sequence **********
for j in motif:
motif_name = j
print motif_name **********
number_count = len(re.findall(j,sequence))
print number_count **********
writer.writerow([motif_name])
for i in fasta_name:
writer.writerow([i]) # [] makes it fit into one column instead of characters taking each columns
带有星号**********的print语句生成此...其中number是Hits的数量,差异序列是seq1,seq2 ......等等。
Seq('QIKDLLVSSSTDLDTTLVLVNAIYFKGMWKTAFNAEDTREMPFHVTKQESKPVQ...LTS', SingleLetterAlphabet())
PA[A-Z]
0
Y[A-Z]L[A-Z]
0
Seq('SFNVATLPAESSSTDLDTTVLLPDEPAEVSDLERIETEWTNMKILELPFAPQMK...VSS', SingleLetterAlphabet())
PA[A-Z]
2
Y[A-Z]L[A-Z]
0
Seq('PAESIYFKIEKTYNLT', SingleLetterAlphabet())
PA[A-Z]
1
Y[A-Z]L[A-Z]
1
答案 0 :(得分:0)
您可以将数据写入Pandas DataFrame,然后使用DataFrame的to_csv方法将其导出为CSV。还有一个to_excel方法。 Pandas不允许您拥有多个具有相同名称的列,例如“Hits”列。但是,您可以通过在第一行中放置所需的列名并在导出时使用header = False选项来解决此问题。
“将pandas导入为pd”,然后用“fasta_name = []”替换你的代码:
column_names = ['SeqName']
for i, m in enumerate(motif):
column_names += ['M'+str(i), 'Hits'+str(i)]
df = pd.DataFrame(columns=column_names)
for row, seq_record in enumerate(SeqIO.parse(f1, 'fasta')):
sequence = repr(seq_record.name)
df.loc[row, 'SeqName'] = sequence
for i, j in enumerate(motif):
df.loc[row, 'M'+str(i)] = j
df.loc[row, 'Hits'+str(i)] = len(re.findall(j, sequence))
df.to_csv(index=False)