我想将数据框更改为可用于简单分析的格式。目前,我的数据框的格式如下:
Carrier | Service | Weight | Area | Charge
A | GRND | 1 | 2 | $5.0
A | GRND | 2 | 2 | $6.0
A | GRND | 3 | 2 | $7.0
B | GRND | 1 | 2 | $5.5
B | GRND | 3 | 2 | $6.9
我想将数据转换为以下格式:
Service | Weight | Area | CarrierA_Charge | CarrierB_Charge
GRND | 1 | 2 | $5.0 | $5.5
GRND | 2 | 2 | $6.0 | NA
GRND | 3 | 2 | $7.0 | $6.9
最终,我的目标是创建一个专栏,为我提供对运营商,称重和区域的每种唯一组合的最低费用的承运人,如下所示:
Service | Weight | Area | CarrierA_Charge | CarrierB_Charge | min_charge |min_charge_carrier
GRND | 1 | 2 | $5.0 | $5.5 | $5.0 | A
GRND | 2 | 2 | $6.0 | NA | $6.0 | A
GRND | 3 | 2 | $7.0 | $6.9 | $6.9 | B
是否有内置的pandas函数可用于实现此目的,或者如何在python中编写函数来实现这一目标?
答案 0 :(得分:2)
IIUC:
d = df.set_index(['Service', 'Weight', 'Area', 'Carrier']).Charge.unstack()
d.rename(columns=f'{d.columns.name}{{}}_Charge'.format) \
.reset_index().rename_axis(None, axis=1)
Service Weight Area CarrierA_Charge CarrierB_Charge
0 GRND 1 2 5.0 5.5
1 GRND 2 2 6.0 NaN
2 GRND 3 2 7.0 6.9
格式和其他列稍有不同
d0 = df.set_index(['Service', 'Weight', 'Area', 'Carrier']).Charge.unstack()
d1 = pd.concat(dict(min_charge=d0.min(1), min_charge_carrier=d0.idxmin(1)), axis=1)
fmt = f'{d.columns.name}{{}}_Charge'.format
d0.rename(columns=fmt).join(d1).reset_index().rename_axis(None, axis=1)
Service Weight Area NoneA_Charge NoneB_Charge min_charge min_charge_carrier
0 GRND 1 2 5.0 5.5 5.0 A
1 GRND 2 2 6.0 NaN 6.0 A
2 GRND 3 2 7.0 6.9 6.9 B
答案 1 :(得分:1)
数据透视表方法
# pivot table
pivot = df.pivot_table(columns = 'Carrier', index=['Service', 'Weight', 'Area'], values='Charge',
aggfunc = np.min).reset_index()
# rename columns here
答案 2 :(得分:1)
要完全回答您的问题,包括多余的列:
首先,我们创建您的数据透视表并相应地重命名您的列:
pivot = df.pivot_table(index=['Service', 'Weight', 'Area'],
columns='Carrier',
values='Charge',
aggfunc=lambda x: ' '.join(x))
pivot.columns = [pivot.columns.name + col + '_Charge' for col in pivot.columns]
pivot.reset_index(inplace=True)
Service Weight Area CarrierA_Charge CarrierB_Charge
0 GRND 1 2 $5.0 $5.5
1 GRND 2 2 $6.0 NaN
2 GRND 3 2 $7.0 $6.9
cols = ['CarrierA_Charge', 'CarrierB_Charge']
for col in cols:
pivot[col] = pivot[col].str.replace('$', '').astype(float)
pivot['min_charge'] = pivot[['CarrierA_Charge', 'CarrierB_Charge']].min(axis=1)
pivot['min_charge_carrier'] = np.where(pivot['min_charge'].eq(pivot['CarrierA_Charge']),
'A', 'B')
Service Weight Area CarrierA_Charge CarrierB_Charge min_charge min_charge_carrier
0 GRND 1 2 5.0 5.5 5.0 A
1 GRND 2 2 6.0 NaN 6.0 A
2 GRND 3 2 7.0 6.9 6.9 B