如何使用其值的频率将行转换为列

时间:2018-02-20 10:30:57

标签: python pandas

如何使用购买次数将行值转换为列,而不使用客户索引

数据:

customer    fruits  veggies grocery
A   apple   carrot  brush
A   apple   carrot  brush
A   apple   onion   soap
A   banana  onion   soap
B   mango   onion   soap
B   mango   carrot  brush
B   banana  tomato  powder
B   banana  tomato  powder
C   apple   carrot  powder
C   mango   carrot  soap
C   mango   tomato  soap
C   banana  tomato  brush
D   banana  carrot  brush
D   banana  onion   soap
D   apple   tomato  powder
D   apple   tomato  powder

预期产出:

customer    apple   mango   banana  carrot  onion   tomato  brush   soap    powder
A   3   0   1   2   2   0   2   2   0
B   0   2   2   1   1   2   1   1   2
C   1   2   1   2   0   2   1   2   1
D   2   0   2   1   1   2   1   1   2

2 个答案:

答案 0 :(得分:3)

选项1
使用set_index + stack + get_dummies

df.set_index('customer').stack().str.get_dummies().sum(level=0)

          apple  banana  brush  carrot  mango  onion  powder  soap  tomato
customer                                                                  
A             3       1      2       2      0      2       0     2       0
B             0       2      1       1      2      1       2     1       2
C             1       1      1       2      2      0       1     2       2
D             2       2      1       1      0      1       2     1       2

选项2
另一个,稍微清洁,使用pd.crosstab

v = df.set_index('customer').stack()
pd.crosstab(v.index.get_level_values(0), v.values)

col_0  apple  banana  brush  carrot  mango  onion  powder  soap  tomato
row_0                                                                  
A          3       1      2       2      0      2       0     2       0
B          0       2      1       1      2      1       2     1       2
C          1       1      1       2      2      0       1     2       2
D          2       2      1       1      0      1       2     1       2

crosstabpivot_table的专用版本,非常适合这类制表操作。

答案 1 :(得分:2)

dot

d = pd.get_dummies(df)
d.columns = d.columns.str.split('_', expand=True)

c = d.pop('customer')

c.T.dot(d)

  fruits              veggies              grocery            
   apple banana mango  carrot onion tomato   brush powder soap
A      3      1     0       2     2      0       2      0    2
B      0      2     2       1     1      2       1      2    1
C      1      1     2       2     0      2       1      1    2
D      2      2     0       1     1      2       1      2    1

bincountfactorize

i, r = df.customer.factorize()
v = df.drop('customer', 1).values
j, c = pd.factorize(v.ravel())
n, m = len(r), len(c)

b = np.bincount(i.repeat(v.shape[1]) * m + j, minlength=n * m).reshape(n, m)

pd.DataFrame(b, r, c)

   apple  carrot  brush  onion  soap  banana  mango  tomato  powder
A      3       2      2      2     2       1      0       0       0
B      0       1      1      1     1       2      2       2       2
C      1       2      1      0     2       1      2       2       1
D      2       1      1      1     1       2      0       2       2