我想使用来自此类数据的py-pandas生成数据透视表
id product credit
1 book -5
1 ipad -15
1 server -25
2 book -5
15 server -25
2 glass -2
2 glass -2
1 book -5
15 glass -2
1 car -150
要 那种电子表格
id 1 2 15
---------------------------------
book -5 (2) -5(1) NA
ipad -15(1) NA NA
server -25(1) NA -25(1)
glass NA -2(2) -2(1)
car -150(1) NA NA
这将显示id为列,产品为行,单位信用和购买的产品数量。
感谢您的帮助
-H
答案 0 :(得分:4)
主要想法是使用pandas...pivot_table()
。
如果您只想sum
数据,那么np.sum
会:
>>> df.pivot_table(cols='id', values='credit', rows='product', aggfunc=np.sum)
id 1 2 15
product
book -10 -5 NaN
car -150 NaN NaN
glass NaN -4 -2
ipad -15 NaN NaN
server -25 NaN -25
或者您可以使用collections.Counter
来获取符合您需求的格式的数据(Counter
效率不高,所以请注意这一点):
>>> from collections import Counter
>>> df.pivot_table(cols='id', values='credit', rows='product', aggfunc=Counter)
id 1 2 15
product
book {-5: 2} {-5: 1} NaN
car {-150: 1} NaN NaN
glass NaN {-2: 2} {-2: 1}
ipad {-15: 1} NaN NaN
server {-25: 1} NaN {-25: 1}
或者定义自定义功能以获得您所需的功能:
>>> from collections import defaultdict
>>> def hlp_count(x):
... d = defaultdict(int)
... for v in x:
... d[v] += 1
... # join in case you have more than one distinct price
... return ', '.join(['{0} ({1})'.format(k, v) for k, v in d.iteritems()])
>>> df.pivot_table(cols='id', values='credit', rows='product', aggfunc=hlp_count)
id 1 2 15
product
book -5 (2) -5 (1) NaN
car -150 (1) NaN NaN
glass NaN -2 (2) -2 (1)
ipad -15 (1) NaN NaN
server -25 (1) NaN -25 (1)