Pandas数据帧数据透视表和分组

时间:2016-08-23 09:08:02

标签: python pandas dataframe alignment pivot

我有一个DataFrame,我将其作为数据透视表,但现在我想订购数据透视表,以便基于特定列的常用值彼此对齐。对于例如订购DataFrame,以便所有公共国家/地区都与同一行对齐:

data = {'dt': ['2016-08-22', '2016-08-21', '2016-08-22', '2016-08-21', '2016-08-21'],
        'country':['uk', 'usa', 'fr','fr','uk'],
        'number': [10, 21, 20, 10,12]
        }

df = pd.DataFrame(data)
print df

  country          dt  number
0      uk  2016-08-22      10
1     usa  2016-08-21      21
2      fr  2016-08-22      20
3      fr  2016-08-21      10
4      uk  2016-08-21      12


#pivot table by dt:

df['idx'] = df.groupby('dt')['dt'].cumcount()
df_pivot = df.set_index(['idx','dt']).stack().unstack([1,2])
print df_pivot
dt       2016-08-22        2016-08-21       
       country number    country number
idx                                    
0           uk     10        usa     21
1           fr     20         fr     10
2          NaN    NaN         uk     12


#what I really want:

        dt    2016-08-22   2016-08-21       
       country number    country number

0           uk     10         uk     12
1           fr     20         fr     10
2          NaN    NaN        usa     21

甚至更好:

               2016-08-22   2016-08-21       
       country  number       number

0           uk     10         12
1           fr     20         10
2          usa    NaN         21

即。来自uk2016-08-22的{​​{1}}值在同一行上对齐

1 个答案:

答案 0 :(得分:1)

您可以使用:

df_pivot = df.set_index(['dt','country']).stack().unstack([0,2]).reset_index()
print (df_pivot)
dt country 2016-08-22 2016-08-21
               number     number
0       fr       20.0       10.0
1       uk       10.0       12.0
2      usa        NaN       21.0  

#change first value of Multiindex from first to second level
cols  = [col for col in df_pivot.columns]
df_pivot.columns = pd.MultiIndex.from_tuples([('','country')] + cols[1:])
print (df_pivot)
          2016-08-22 2016-08-21
  country     number     number
0      fr       20.0       10.0
1      uk       10.0       12.0
2     usa        NaN       21.0

另一个更简单的解决方案是使用pivot

df_pivot = df.pivot(index='country', columns='dt', values='number')
print (df_pivot)
dt       2016-08-21  2016-08-22
country                        
fr             10.0        20.0
uk             12.0        10.0
usa            21.0         NaN