Question

我正在尝试使用CSV将一些数据写入Excel电子表格。我正在写一个主题查找器，从fasta读取输入并输出到excel。但是我很难以正确的格式编写数据。

我在excel中的期望结果如下所示..

SeqName     M1      Hits    M2          Hits
Seq1        MN[A-Z] 3       V[A-Z]R[ML] 2
Seq2        MN[A-Z] 0       V[A-Z]R[ML] 5
Seq3        MN[A-Z] 1       V[A-Z]R[ML] 0

我已经生成了正确的结果，但我只是不知道如何将它们放在正确的格式中。

这是我到目前为止的代码。

import re
from Bio import SeqIO
import csv
import collections

def SearchMotif(f1, motif, f2="motifs.xls"):
    with open(f1, 'r') as fin, open(f2,'wb') as fout:
        # This makes SeqName static and everything else mutable thus, when more than 1 motifs are searched,
        # they can be correctly placed into excel.
        writer = csv.writer(fout, delimiter = '\t')
        motif_fieldnames = ['SeqName']
        writer_dict = csv.DictWriter(fout,delimiter = '\t' ,fieldnames=motif_fieldnames)
        for i in range(0,len(motif),1):
            motif_fieldnames.append('M%d' %(i+1))
            motif_fieldnames.append('Hits')
        writer_dict.writeheader()

# Reading input fasta file for processing.
    fasta_name = []
    for seq_record in SeqIO.parse(f1,'fasta'):
        sequence = repr(seq_record.seq) # re-module only takes string
        fasta_name.append(seq_record.name)
        print sequence            **********
        for j in motif:
            motif_name = j
            print motif_name       **********
            number_count = len(re.findall(j,sequence))
            print number_count     **********
            writer.writerow([motif_name])


    for i in fasta_name:
        writer.writerow([i]) # [] makes it fit into one column instead of characters taking each columns

带有星号**********的print语句生成此...其中number是Hits的数量，差异序列是seq1，seq2 ......等等。

Seq('QIKDLLVSSSTDLDTTLVLVNAIYFKGMWKTAFNAEDTREMPFHVTKQESKPVQ...LTS', SingleLetterAlphabet())
PA[A-Z]
0
Y[A-Z]L[A-Z]
0
Seq('SFNVATLPAESSSTDLDTTVLLPDEPAEVSDLERIETEWTNMKILELPFAPQMK...VSS', SingleLetterAlphabet())
PA[A-Z]
2
Y[A-Z]L[A-Z]
0
Seq('PAESIYFKIEKTYNLT', SingleLetterAlphabet())
PA[A-Z]
1
Y[A-Z]L[A-Z]
1

Answer 1

您可以将数据写入Pandas DataFrame，然后使用DataFrame的to_csv方法将其导出为CSV。还有一个to_excel方法。 Pandas不允许您拥有多个具有相同名称的列，例如“Hits”列。但是，您可以通过在第一行中放置所需的列名并在导出时使用header = False选项来解决此问题。

“将pandas导入为pd”，然后用“fasta_name = []”替换你的代码：

column_names = ['SeqName']
for i, m in enumerate(motif):
    column_names += ['M'+str(i), 'Hits'+str(i)]

df = pd.DataFrame(columns=column_names)

for row, seq_record in enumerate(SeqIO.parse(f1, 'fasta')):
    sequence = repr(seq_record.name)
    df.loc[row, 'SeqName'] = sequence
    for i, j in enumerate(motif):
        df.loc[row, 'M'+str(i)] = j
        df.loc[row, 'Hits'+str(i)] = len(re.findall(j, sequence))

df.to_csv(index=False)

使用CSV Python将处理后的数据写入excel

1 个答案: