我有一个pandas数据框,它是旋转的结果。它有多个指数。我想得到一个正常的"数据框从这个旋转的df ...中,以便我可以对新的df进行一些正常的操作。
以下是一个示例:我的透视数据框如下所示:
feature_value
feature_type f1 f2 f3 f4 f5
time name
2016-05-10 Clay 0 1 30 4 40
2016-05-10 John 0 4 10 4 66
2016-05-10 Mary 0 1 40 4 46
2016-05-10 Boby 2 0 30 4 59
2016-05-10 Lucy 5 8 20 4 41
以下是我想要的新df:
time name f1 f2 f3 f4 f5
2016-05-10 Clay 0 1 30 4 40
2016-05-10 John 0 4 10 4 66
2016-05-10 Mary 0 1 40 4 46
2016-05-10 Boby 2 0 30 4 59
2016-05-10 Lucy 5 8 20 4 41
我该怎么做?
pivoted_df.to_dict()如下所示:
{('feature_value', 'f1'): {(Timestamp('2016-05-10'), 'Clay'): 0, (Timestamp('2016-05-10'), 'John'): 0, (Timestamp('2016-05-10'), 'Mary'): 0, (Timestamp('2016-05-10'), 'Boby'): 2, (Timestamp('2016-05-10'), 'Lucy'): 5}, ('feature_value', 'f2'): {(Timestamp('2016-05-10'), 'Clay'): 1, (Timestamp('2016-05-10'), 'John'): 4, (Timestamp('2016-05-10'), 'Mary'): 1, (Timestamp('2016-05-10'), 'Boby'): 0, (Timestamp('2016-05-10'), 'Lucy'): 8}, ('feature_value', 'f3'): {(Timestamp('2016-05-10'), 'Clay'): 30, (Timestamp('2016-05-10'), 'John'): 10, (Timestamp('2016-05-10'), 'Mary'): 40, (Timestamp('2016-05-10'), 'Boby'): 30, (Timestamp('2016-05-10'), 'Lucy'): 20}, ('feature_value', 'f4'): {(Timestamp('2016-05-10'), 'Clay'): 4, (Timestamp('2016-05-10'), 'John'): 4, (Timestamp('2016-05-10'), 'Mary'): 4, (Timestamp('2016-05-10'), 'Boby'): 4, (Timestamp('2016-05-10'), 'Lucy'): 4}, ('feature_value', 'f5'): {(Timestamp('2016-05-10'), 'Clay'): 40, (Timestamp('2016-05-10'), 'John'): 66, (Timestamp('2016-05-10'), 'Mary'): 46, (Timestamp('2016-05-10'), 'Boby'): 59, (Timestamp('2016-05-10'), 'Lucy'): 41}}
答案 0 :(得分:4)
致电pivot_table
时,请务必指定values
参数:
df.pivot_table(index=['time', 'name'], columns=['feature_type'],
values='feature_value')
没有values='feature_value'
,您将获得一个MultiIndex列索引(可能)有一个外部级别,例如'feature_value'
。
df.pivot_table(index=['time', 'name'], ...)
还会返回一个DataFrame,其中包含time
和name
级别的MultiIndex行索引。要使这些索引级别成为常规列,请调用reset_index()
:
result = df.pivot_table(index=['time', 'name'],
columns=['feature_type'],
values='feature_value').reset_index()
例如,
import numpy as np
import pandas as pd
np.random.seed(2016)
N = 10
df = pd.DataFrame(
{'time': np.random.choice(pd.date_range('2016-05-10', '2016-05-12'), size=N),
'name': np.random.choice(['Clay', 'John', 'Mary', 'Boby', 'Lucy'], size=N),
'feature_type': np.random.choice(['f{}'.format(i) for i in range(1,6)], size=N),
'feature_value': np.random.randint(100, size=N)})
orig = df.pivot_table(index=['time', 'name'], columns=['feature_type'])
print(orig)
alt = df.pivot_table(index=['time', 'name'],
columns=['feature_type'],
values='feature_value').reset_index()
alt.columns.name = None
print(alt)
orig
看起来像这样:
feature_value
feature_type f1 f2 f3 f4 f5
time name
2016-05-10 John NaN 50.0 NaN NaN 91.0
Lucy NaN NaN NaN 28.0 NaN
Mary NaN NaN 19.0 NaN 27.0
2016-05-11 Clay 2.0 NaN NaN NaN NaN
Lucy 24.0 NaN NaN NaN NaN
2016-05-12 Boby NaN 16.0 NaN NaN NaN
John NaN NaN NaN NaN 62.0
Mary NaN NaN NaN 84.0 NaN
虽然alt
看起来像
time name f1 f2 f3 f4 f5
0 2016-05-10 John NaN 50.0 NaN NaN 91.0
1 2016-05-10 Lucy NaN NaN NaN 28.0 NaN
2 2016-05-10 Mary NaN NaN 19.0 NaN 27.0
3 2016-05-11 Clay 2.0 NaN NaN NaN NaN
4 2016-05-11 Lucy 24.0 NaN NaN NaN NaN
5 2016-05-12 Boby NaN 16.0 NaN NaN NaN
6 2016-05-12 John NaN NaN NaN NaN 62.0
7 2016-05-12 Mary NaN NaN NaN 84.0 NaN
答案 1 :(得分:2)
蛮力:
df.columns = df.columns.droplevel()
df = df.reset_index()