我正在尝试将以下格式的数据帧拆分为基于特定值的多个数据帧。
Column0 Column1 Column2 Column3
Question Answer Reason 30
It is received? XXX YYY 27
Deducted FDF RES 64
Transferred? WWW RRR 64
Transport Services Passgener Carrier 30
Distance KKK WDF 27
Return PPP LMN 64
在上面的数据框中,我想将从Event2或Code = 30(特定颜色代码或Header代码)开始的行拆分为单独的数据帧,并将其余(在上面)分成其他数据帧(可能有两个以上的事件)也同样)。
我尝试了很少的代码,但大多数是用于过滤目的。
预期产量为: Dataframe1:
Question Answer Reason 30
It is received? XXX YYY 27
Deducted FDF RES 64
Transferred? WWW RRR 64
Dataframe2:
Transport Services Passgener Carrier 30
Distance KKK WDF 27
Return PPP LMN 64
请帮助我,因为我是python的新手。
答案 0 :(得分:0)
您可以groupby
根据column
临时帮助Code
分隔出不同的DataFrames
并将其添加到dictionary
。我假设你的数据实际上与原始模式相符:
Question Answer Reason Code
0 It is received? XXX YYY 30
1 Deducted FDF RES 64
2 Transferred? WWW RRR 64
3 Transport Services Passgener Carrier 30
4 Distance KKK WDF 27
5 Return PPP LMN 64
如果是这样,你可以这样做:
df['tmp'] = df.apply(lambda x: x.Question if x.Code == 30 else np.nan, axis=1).fillna(method='ffill')
得到:
Question Answer Reason Code tmp
0 It is received? XXX YYY 30 It is received?
1 Deducted FDF RES 64 It is received?
2 Transferred? WWW RRR 64 It is received?
3 Transport Services Passgener Carrier 30 Transport Services
4 Distance KKK WDF 27 Transport Services
5 Return PPP LMN 64 Transport Services
在此处,您可以enumerate
groups
列中的tmp
,并将结果添加到dictionary
integer
keys
:
questions = {}
for e, (event, data) in enumerate(df.groupby('tmp')):
questions[e] = data.drop('tmp', axis=1)
questions[0]
Question Answer Reason Code
0 It is received? XXX YYY 30
1 Deducted FDF RES 64
2 Transferred? WWW RRR 64
questions[1]
Question Answer Reason Code
3 Transport Services Passgener Carrier 30
4 Distance KKK WDF 27
5 Return PPP LMN 64