我正在努力阐明我的问题,因此我将通过示例进行演示。
假设我有一个看起来像这样的DataFrame:
>>> df = pd.DataFrame([{'person': 'bob', 'year': 2016, 'production': 30, 'efficiency': .10}, {'person': 'bob', 'year': 2017, 'production': 35, 'efficiency': .11}, {'person': 'bob', 'year': 2018, 'production': 15, 'efficiency': .05}])
>>> df
efficiency person production year
0 0.10 bob 30 2016
1 0.11 bob 35 2017
2 0.05 bob 15 2018
我需要生成一份报告,其中包含一行上每个人的所有信息。因此,我想将以上内容转换为:
efficiency 2016 person production 2016 efficiency 2017 production 2017 \
0 0.1 bob 30 0.11 35
efficiency 2018 production 2018
0 0.05 15
此代码能够进行转换,但效率极低:
def combine_years(df):
final_df = None
for name, stats in df.groupby('person'):
agg_df = None
for year in stats['year']:
new_df = stats[stats.year == year].rename(columns=lambda colname: column_renamer(colname, year))
new_df = new_df.drop('year', axis=1)
if agg_df is None:
agg_df = new_df
else:
agg_df = agg_df.merge(new_df, how='outer', on=['person'])
if final_df is None:
final_df = agg_df
else:
final_df = pd.concat([final_df, agg_df], axis=1)
return final_df
几个问题:
答案 0 :(得分:1)
set_index
我希望'person'
以index
结尾,并将columns
保留为pandas.MultiIndex
df.set_index(['person', 'year']).unstack().swaplevel(0, 1, 1).sort_index(1)
year 2016 2017 2018
efficiency production efficiency production efficiency production
person
bob 0.1 30 0.11 35 0.05 15
pivot_table
df.pivot_table(index='person', columns='year').swaplevel(0, 1, 1).sort_index(1)
year 2016 2017 2018
efficiency production efficiency production efficiency production
person
bob 0.1 30 0.11 35 0.05 15