使用Python Pandas的数据透视表,单价和

时间:2013-12-05 11:51:43

标签: python python-2.7 pandas pivot

我想使用来自此类数据的py-pandas生成数据透视表

id      product  credit
1        book      -5
1        ipad     -15
1      server     -25
2        book      -5
15      server     -25
2       glass      -2
2       glass      -2
1        book      -5
15       glass      -2
1         car    -150

要 那种电子表格

id        1          2        15
---------------------------------
book     -5 (2)     -5(1)     NA
ipad     -15(1)      NA       NA
server   -25(1)      NA      -25(1)
glass     NA        -2(2)    -2(1)
car       -150(1)    NA       NA

这将显示id为列,产品为行,单位信用和购买的产品数量。

感谢您的帮助

-H

1 个答案:

答案 0 :(得分:4)

主要想法是使用pandas...pivot_table()

如果您只想sum数据,那么np.sum会:

>>> df.pivot_table(cols='id', values='credit', rows='product', aggfunc=np.sum)
id        1   2   15
product             
book     -10  -5 NaN
car     -150 NaN NaN
glass    NaN  -4  -2
ipad     -15 NaN NaN
server   -25 NaN -25

或者您可以使用collections.Counter来获取符合您需求的格式的数据(Counter效率不高,所以请注意这一点):

>>> from collections import Counter
>>> df.pivot_table(cols='id', values='credit', rows='product', aggfunc=Counter)
id              1        2         15
product                              
book       {-5: 2}  {-5: 1}       NaN
car      {-150: 1}      NaN       NaN
glass          NaN  {-2: 2}   {-2: 1}
ipad      {-15: 1}      NaN       NaN
server    {-25: 1}      NaN  {-25: 1}

或者定义自定义功能以获得您所需的功能:

>>> from collections import defaultdict
>>> def hlp_count(x):
...     d = defaultdict(int)
...     for v in x:
...         d[v] += 1
...     # join in case you have more than one distinct price
...     return ', '.join(['{0} ({1})'.format(k, v) for k, v in d.iteritems()])

>>> df.pivot_table(cols='id', values='credit', rows='product', aggfunc=hlp_count)
id             1       2        15
product                           
book       -5 (2)  -5 (1)      NaN
car      -150 (1)     NaN      NaN
glass         NaN  -2 (2)   -2 (1)
ipad      -15 (1)     NaN      NaN
server    -25 (1)     NaN  -25 (1)