我想添加一个包含两列的pandas数据框:read_id和score
我使用以下代码:
reads_array = []
for x in Bio.SeqIO.parse("inp.fasta","fasta"):
reads_array.append(x)
columns = ["read_id","score"]
df = pd.DataFrame(columns = columns)
df = df.fillna(0)
for x in reads_array:
alignments=pairwise2.align.globalms("ACTTGAT",str(x.seq),2,-1,-.5,-.1)
sorted_alignments = sorted(alignments, key=operator.itemgetter(2),reverse = True)
read_id = x.name
score = sorted_alignments[0][2]
df['read_id'] = read_id
df['score'] = score
但这不起作用。你能建议一种生成数据帧df的方法
答案 0 :(得分:0)
df['read_id']
和df['score']
是系列。因此,如果您要迭代reads_array
并计算某个值,请将其分配给df的列,请尝试以下操作:
for i, x in enumerate(reads_array):
...
df.ix[i]['read_id'] = read_id
df.ix[i]['score'] = score
答案 1 :(得分:0)
在顶部确保你有
import numpy as np
然后用
替换您共享的代码reads_array = []
for x in Bio.SeqIO.parse("inp.fastq", "fastq"):
reads_array.append(x)
df = pd.DataFrame(np.zeros((len(reads_array), 2)), columns=["read_id", "score"])
for index, x in enumerate(reads_array):
alignments = pairwise2.align.globalms("ACTTGAT", str(x.seq), 2, -1, -.5, -.1)
sorted_alignments = sorted(alignments, key=operator.itemgetter(2), reverse=True)
read_id = x.name
score = sorted_alignments[0][2]
df.loc[index, 'read_id'] = read_id
df.loc[index, 'score'] = score
原始代码的主要问题是两件事:
1)您的数据框有0行
2)df ['column_name']是指整个列,而不是单个单元格,因此当您执行df ['column_name'] = value时,该列中的所有单元格都将设置为该值