在特定值之后将Dataframe拆分为多个数据帧

时间:2016-01-14 10:08:45

标签: python pandas

我正在尝试将以下格式的数据帧拆分为基于特定值的多个数据帧。

Column0              Column1     Column2  Column3
Question             Answer      Reason   30
It is received?      XXX         YYY      27
Deducted             FDF         RES      64
Transferred?         WWW         RRR      64
Transport Services   Passgener   Carrier  30
Distance             KKK         WDF      27
Return               PPP         LMN      64

在上面的数据框中,我想将从Event2或Code = 30(特定颜色代码或Header代码)开始的行拆分为单独的数据帧,并将其余(在上面)分成其他数据帧(可能有两个以上的事件)也同样)。

我尝试了很少的代码,但大多数是用于过滤目的。

预期产量为: Dataframe1:

Question             Answer      Reason   30
It is received?      XXX         YYY      27
Deducted             FDF         RES      64
Transferred?         WWW         RRR      64

Dataframe2:

Transport Services   Passgener   Carrier  30
Distance             KKK         WDF      27
Return               PPP         LMN      64

请帮助我,因为我是python的新手。

1 个答案:

答案 0 :(得分:0)

您可以groupby根据column临时帮助Code分隔出不同的DataFrames并将其添加到dictionary。我假设你的数据实际上与原始模式相符:

             Question     Answer   Reason  Code
0     It is received?        XXX      YYY    30
1            Deducted        FDF      RES    64
2        Transferred?        WWW      RRR    64
3  Transport Services  Passgener  Carrier    30
4            Distance        KKK      WDF    27
5              Return        PPP      LMN    64

如果是这样,你可以这样做:

df['tmp'] = df.apply(lambda x: x.Question if x.Code == 30 else np.nan, axis=1).fillna(method='ffill')

得到:

             Question     Answer   Reason  Code                 tmp
0     It is received?        XXX      YYY    30     It is received?
1            Deducted        FDF      RES    64     It is received?
2        Transferred?        WWW      RRR    64     It is received?
3  Transport Services  Passgener  Carrier    30  Transport Services
4            Distance        KKK      WDF    27  Transport Services
5              Return        PPP      LMN    64  Transport Services

在此处,您可以enumerate groups列中的tmp,并将结果添加到dictionary integer keys

questions = {}
for e, (event, data) in enumerate(df.groupby('tmp')):
    questions[e] = data.drop('tmp', axis=1)

questions[0]

          Question Answer Reason  Code
0  It is received?    XXX    YYY    30
1         Deducted    FDF    RES    64
2     Transferred?    WWW    RRR    64

questions[1]

             Question     Answer   Reason  Code
3  Transport Services  Passgener  Carrier    30
4            Distance        KKK      WDF    27
5              Return        PPP      LMN    64