我想要迭代的pandas数据帧。例如,我的数据框的简化版本可以是:
chr start end Gene Value MoreData
chr1 123 123 HAPPY 41.1 3.4
chr1 125 129 HAPPY 45.9 4.5
chr1 140 145 HAPPY 39.3 4.1
chr1 342 355 SAD 34.2 9.0
chr1 360 361 SAD 44.3 8.1
chr1 390 399 SAD 29.0 7.2
chr1 400 411 SAD 35.6 6.5
chr1 462 470 LEG 20.0 2.7
我想迭代每个独特的基因并创建一个名为:
的新文件for Gene in df: ## this is where I need the most help
OutFileName = Gene+".pdf"
对于上面的例子,我应该进行三次迭代,包括3个outfiles和3个数据帧:
HAPPY.pdf
chr1 123 123 HAPPY 41.1 3.4
chr1 125 129 HAPPY 45.9 4.5
chr1 140 145 HAPPY 39.3 4.1
SAD.pdf
chr1 342 355 SAD 34.2 9.0
chr1 360 361 SAD 44.3 8.1
chr1 390 399 SAD 29.0 7.2
chr1 400 411 SAD 35.6 6.5
Leg.pdf
chr1 462 470 LEG 20.0 2.7
由块分割的结果数据帧内容将被发送到另一个将执行分析并返回要写入文件的内容的函数。
答案 0 :(得分:17)
您可以获取调用unique
的唯一值,对此进行迭代,构建文件名并将其写入csv:
In [78]:
genes = df['Gene'].unique()
for gene in genes:
outfilename = gene + '.pdf'
print(outfilename)
df[df['Gene'] == gene].to_csv(outfilename)
HAPPY.pdf
SAD.pdf
LEG.pdf
更多大熊猫 - thonic方法是将基因组合在一起。然后迭代这些组:
In [93]:
gp = df.groupby('Gene')
# groups() returns a dict with 'Gene':indices as k:v pair
for g in gp.groups.items():
print(df.loc[g[1]])
chr start end Gene Value MoreData
0 chr1 123 123 HAPPY 41.1 3.4
1 chr1 125 129 HAPPY 45.9 4.5
2 chr1 140 145 HAPPY 39.3 4.1
chr start end Gene Value MoreData
3 chr1 342 355 SAD 34.2 9.0
4 chr1 360 361 SAD 44.3 8.1
5 chr1 390 399 SAD 29.0 7.2
6 chr1 400 411 SAD 35.6 6.5
chr start end Gene Value MoreData
7 chr1 462 470 LEG 20 2.7
答案 1 :(得分:0)
.groupby()
时或在循环内执行其他函数的聚合。pandas.DataFrame.groupby
返回与 groupby 列中的每个唯一值关联的数据帧组件。
.groups.items():
在下面的代码中没有使用,这使得使用 group
作为文件名很容易。f-strings
为每个 group
创建唯一的文件名。import pandas as pd
# create the dataframe
data = {'chr': ['chr1', 'chr1', 'chr1', 'chr1', 'chr1', 'chr1', 'chr1', 'chr1'], 'start': [123, 125, 140, 342, 360, 390, 400, 462], 'end': [123, 129, 145, 355, 361, 399, 411, 470], 'Gene': ['HAPPY', 'HAPPY', 'HAPPY', 'SAD', 'SAD', 'SAD', 'SAD', 'LEG'], 'Value': [41.1, 45.9, 39.3, 34.2, 44.3, 29.0, 35.6, 20.0], 'MoreData': [3.4, 4.5, 4.1, 9.0, 8.1, 7.2, 6.5, 2.7]}
df = pd.DataFrame(data)
# groupby the desired column and iterate through the groupby object
for group, dataframe in df.groupby('Gene'):
# save the dataframe for each group to a csv
dataframe.to_csv(f'{group}.csv', index=False)
HAPPY.csv
chr,start,end,Gene,Value,MoreData
chr1,123,123,HAPPY,41.1,3.4
chr1,125,129,HAPPY,45.9,4.5
chr1,140,145,HAPPY,39.3,4.1
SAD.csv
chr,start,end,Gene,Value,MoreData
chr1,342,355,SAD,34.2,9.0
chr1,360,361,SAD,44.3,8.1
chr1,390,399,SAD,29.0,7.2
chr1,400,411,SAD,35.6,6.5
LEG.csv
chr,start,end,Gene,Value,MoreData
chr1,462,470,LEG,20.0,2.7