总结pandas中的数据框 - python

时间:2016-04-29 19:47:25

标签: python pandas dataframe summary

df = pd.DataFrame({'a':['y',NaN,'y',NaN,NaN,'x','x','y',NaN],'b':[NaN,'x',NaN,'y','x',NaN,NaN,NaN,'y'],'d':[1,0,0,1,1,1,0,1,0]})

我尝试使用sum汇总此数据框。我认为df.groupby(['a','b']).aggregate(sum)会起作用,但会返回一个空的Series

如何实现这一结果?

   a  b
x  1  1
y  2  1

1 个答案:

答案 0 :(得分:2)

import numpy as np
import pandas as pd
NaN = np.nan

df = pd.DataFrame(
    {'a':['y',NaN,'y',NaN,NaN,'x','x','y',NaN],
     'b':[NaN,'x',NaN,'y','x',NaN,NaN,NaN,'y'],
     'd':[32,12,55,98,23,11,9,91,3]})

melted = pd.melt(df, id_vars=['d'], value_vars=['a', 'b'])
result = pd.pivot_table(melted, values='d', index=['value'], columns=['variable'], 
                        aggfunc=np.median)
print(result)

产量

variable     a     b
value               
x         10.0  17.5
y         55.0  50.5

解释

带有melted = pd.melt(df, value_vars=['a', 'b'])

Melting the DataFrame会产生

     d variable value
0   32        a     y
1   12        a   NaN
2   55        a     y
3   98        a   NaN
4   23        a   NaN
5   11        a     x
6    9        a     x
7   91        a     y
8    3        a   NaN
9   32        b   NaN
10  12        b     x
11  55        b   NaN
12  98        b     y
13  23        b     x
14  11        b   NaN
15   9        b   NaN
16  91        b   NaN
17   3        b     y

现在我们可以使用pd.pivot_table来转移和汇总d值:

result = pd.pivot_table(melted, values='d', index=['value'], columns=['variable'], 
                        aggfunc=np.median)

请注意,如果您希望以多种方式汇总数据,aggfunc可以采用一系列功能,例如[np.sum, np.median, np.min, np.max, np.std]