我有以下熊猫DataFrame df
:
SIGN TYPE TIME ADDITIONAL
ABC5245 10 2017-01-01 01:52:25.000 2017-01-01 01:39:04.000
ABC5245 20 2017-01-01 01:53:22.000 2017-01-01 02:39:04.000
DEF1111 20 2017-01-01 01:57:00.000 2017-01-01 03:39:04.000
DEF1111 10 2017-01-01 01:55:15.000 2017-01-01 01:39:04.000
AAA2222 10 2017-01-01 01:57:00.000 2017-01-01 01:39:04.000
我需要按SIGN
对数据进行分组,并根据TYPE
创建四个新列:TIME_10
,TIME_20
,ADDITIONAL_10
和ADDITIONAL_20
这是预期的结果:
SIGN TIME_10 TIME_20 ADDITIONAL_10 ADDITIONAL_20
ABC5245 2017-01-01 01:52:25.000 2017-01-01 01:53:22.000 2017-01-01 01:39:04.000 2017-01-01 02:39:04.000
DEF1111 2017-01-01 01:55:15.000 2017-01-01 01:57:00.000 2017-01-01 01:39:04.000 2017-01-01 03:39:04.000
AAA2222 2017-01-01 01:57:00.000 NaN 2017-01-01 01:39:04.000 NaN
答案 0 :(得分:4)
使用重塑和展平列标题多索引
df_out = df.set_index(['SIGN','TYPE']).unstack('TYPE')
df_out.columns = [f'{i}_{j}' for i, j in df_out.columns]
print(df_out)
输出:
TIME_10 TIME_20 \
SIGN
AAA2222 2017-01-01 01:57:00.000 NaN
ABC5245 2017-01-01 01:52:25.000 2017-01-01 01:53:22.000
DEF1111 2017-01-01 01:55:15.000 2017-01-01 01:57:00.000
ADDITIONAL_10 ADDITIONAL_20
SIGN
AAA2222 2017-01-01 01:39:04.000 NaN
ABC5245 2017-01-01 01:39:04.000 2017-01-01 02:39:04.000
DEF1111 2017-01-01 01:39:04.000 2017-01-01 03:39:04.000
答案 1 :(得分:3)
您可以使用透视图获得结果。如果您可以将列设为MultiIndex,则不需要第二行。
感谢@ScottBoston提供有关列格式的提示。
df = df.pivot('SIGN', 'TYPE', ['TIME', 'ADDITIONAL'])
df.columns = df.columns.map('{0[0]}_{0[1]}'.format)
编辑
在上下文中:
import pandas as pd
data = [
['ABC5245', 10, '2017-01-01 01:52:25.000', '2017-01-01 01:39:04.000'],
['ABC5245', 20, '2017-01-01 01:53:22.000', '2017-01-01 02:39:04.000'],
['DEF1111', 20, '2017-01-01 01:57:00.000', '2017-01-01 03:39:04.000'],
['DEF1111', 10, '2017-01-01 01:55:15.000', '2017-01-01 01:39:04.000'],
['AAA2222', 10, '2017-01-01 01:57:00.000', '2017-01-01 01:39:04.000'],
]
columns = ['SIGN', 'TYPE', 'TIME', 'ADDITIONAL']
df = pd.DataFrame(data=data, columns=columns)
print(df)
df = df.pivot('SIGN', 'TYPE', ['TIME', 'ADDITIONAL'])
df.columns = df.columns.map('{0[0]}_{0[1]}'.format)
print(df)
输出:
SIGN TYPE TIME ADDITIONAL
0 ABC5245 10 2017-01-01 01:52:25.000 2017-01-01 01:39:04.000
1 ABC5245 20 2017-01-01 01:53:22.000 2017-01-01 02:39:04.000
2 DEF1111 20 2017-01-01 01:57:00.000 2017-01-01 03:39:04.000
3 DEF1111 10 2017-01-01 01:55:15.000 2017-01-01 01:39:04.000
4 AAA2222 10 2017-01-01 01:57:00.000 2017-01-01 01:39:04.000
TIME_10 TIME_20 ADDITIONAL_10 ADDITIONAL_20
SIGN
AAA2222 2017-01-01 01:57:00.000 NaN 2017-01-01 01:39:04.000 NaN
ABC5245 2017-01-01 01:52:25.000 2017-01-01 01:53:22.000 2017-01-01 01:39:04.000 2017-01-01 02:39:04.000
DEF1111 2017-01-01 01:55:15.000 2017-01-01 01:57:00.000 2017-01-01 01:39:04.000 2017-01-01 03:39:04.000