如何使用df.iterrows根据条件if语句切片熊猫数据框并创建新的数据框

时间:2020-07-15 17:41:21

标签: python pandas dataframe conditional-statements

我有一个由多个Excel文件生成的大型数据框。我想按行对数据帧进行切片,并根据“样本名称”列的条件生成单独的数据帧。我要切片的数据帧看起来像:

    Well Position Sample Name  Target Name  CT
0              A1       human      52928.0  40
1              A2       mouse      52928.0  32
2              A3         rat      52928.0  40
3              A4       human      52928.0  40
4              A5       human      52928.0  35

源excel文件可能包含也可能不包含所有三个种类的数据。例如,它们可能是所有人类样品,所有小鼠样品或所有大鼠样品。

我想要的结果是:

human_df 
    Well Position Sample Name  Target Name  CT
0              A1       human      52928.0  40
1              A4       human      52928.0  40
2              A5       human      52928.0  35

rat_df
    Well Position Sample Name  Target Name  CT
0              A3         rat      52928.0  40

mouse_df
    Well Position Sample Name  Target Name  CT
0              A2       mouse      52928.0  32

我执行此功能的尝试是:

for i,row in data.iterrows():
            if row['Sample Name'] in data.iterrows() == 'mouse' or 'Mouse' or 'MOUSE':
                species = 'mouse'
                #make a new df_mouse
                df_mouse = data[(data['Sample Name'] == species)] 

            if row['Sample Name'] in data.iterrows() == 'human' or 'Human' or 'HUMAN':
                species = 'human'
                df_human = data[(data['Sample Name'] == species)]
                print("Human Dataframe = ", df_human)

            if row['Sample Name'] in data.iterrows() == 'rat' or 'Rat' or 'RAT':
                species = 'rat'
                df_rat = data[(data['Sample Name'] == species)]
                print("Rat Dataframe = ", df_rat)

这在某种程度上有效,但是当三个种类之一不在原始excel文件中时失败。预先感谢您的帮助。

1 个答案:

答案 0 :(得分:1)

在列Sample Name上使用DataFrame.groupby并使用dict理解将每个分组的df存储在字典dct中,只需使用dct['name_of_df']即可引用存储的数据帧:

dct = {f'{k}_df': g.reset_index(drop=True) for k, g in df.groupby('Sample Name')}

结果:

# dct['human_df']
  Well Position Sample Name  Target Name  CT
0            A1       human      52928.0  40
1            A4       human      52928.0  40
2            A5       human      52928.0  35

# dct['rat_df']
    Well Position Sample Name  Target Name  CT
0              A3         rat      52928.0  40

# dct['mouse_df']
    Well Position Sample Name  Target Name  CT
0              A2       mouse      52928.0  32