我有一个由多个Excel文件生成的大型数据框。我想按行对数据帧进行切片,并根据“样本名称”列的条件生成单独的数据帧。我要切片的数据帧看起来像:
Well Position Sample Name Target Name CT
0 A1 human 52928.0 40
1 A2 mouse 52928.0 32
2 A3 rat 52928.0 40
3 A4 human 52928.0 40
4 A5 human 52928.0 35
源excel文件可能包含也可能不包含所有三个种类的数据。例如,它们可能是所有人类样品,所有小鼠样品或所有大鼠样品。
我想要的结果是:
human_df
Well Position Sample Name Target Name CT
0 A1 human 52928.0 40
1 A4 human 52928.0 40
2 A5 human 52928.0 35
rat_df
Well Position Sample Name Target Name CT
0 A3 rat 52928.0 40
mouse_df
Well Position Sample Name Target Name CT
0 A2 mouse 52928.0 32
我执行此功能的尝试是:
for i,row in data.iterrows():
if row['Sample Name'] in data.iterrows() == 'mouse' or 'Mouse' or 'MOUSE':
species = 'mouse'
#make a new df_mouse
df_mouse = data[(data['Sample Name'] == species)]
if row['Sample Name'] in data.iterrows() == 'human' or 'Human' or 'HUMAN':
species = 'human'
df_human = data[(data['Sample Name'] == species)]
print("Human Dataframe = ", df_human)
if row['Sample Name'] in data.iterrows() == 'rat' or 'Rat' or 'RAT':
species = 'rat'
df_rat = data[(data['Sample Name'] == species)]
print("Rat Dataframe = ", df_rat)
这在某种程度上有效,但是当三个种类之一不在原始excel文件中时失败。预先感谢您的帮助。
答案 0 :(得分:1)
在列Sample Name
上使用DataFrame.groupby
并使用dict理解将每个分组的df存储在字典dct
中,只需使用dct['name_of_df']
即可引用存储的数据帧:
dct = {f'{k}_df': g.reset_index(drop=True) for k, g in df.groupby('Sample Name')}
结果:
# dct['human_df']
Well Position Sample Name Target Name CT
0 A1 human 52928.0 40
1 A4 human 52928.0 40
2 A5 human 52928.0 35
# dct['rat_df']
Well Position Sample Name Target Name CT
0 A3 rat 52928.0 40
# dct['mouse_df']
Well Position Sample Name Target Name CT
0 A2 mouse 52928.0 32