如何使用购买次数将行值转换为列,而不使用客户索引
数据:
customer fruits veggies grocery
A apple carrot brush
A apple carrot brush
A apple onion soap
A banana onion soap
B mango onion soap
B mango carrot brush
B banana tomato powder
B banana tomato powder
C apple carrot powder
C mango carrot soap
C mango tomato soap
C banana tomato brush
D banana carrot brush
D banana onion soap
D apple tomato powder
D apple tomato powder
预期产出:
customer apple mango banana carrot onion tomato brush soap powder
A 3 0 1 2 2 0 2 2 0
B 0 2 2 1 1 2 1 1 2
C 1 2 1 2 0 2 1 2 1
D 2 0 2 1 1 2 1 1 2
答案 0 :(得分:3)
选项1
使用set_index
+ stack
+ get_dummies
:
df.set_index('customer').stack().str.get_dummies().sum(level=0)
apple banana brush carrot mango onion powder soap tomato
customer
A 3 1 2 2 0 2 0 2 0
B 0 2 1 1 2 1 2 1 2
C 1 1 1 2 2 0 1 2 2
D 2 2 1 1 0 1 2 1 2
选项2
另一个,稍微清洁,使用pd.crosstab
:
v = df.set_index('customer').stack()
pd.crosstab(v.index.get_level_values(0), v.values)
col_0 apple banana brush carrot mango onion powder soap tomato
row_0
A 3 1 2 2 0 2 0 2 0
B 0 2 1 1 2 1 2 1 2
C 1 1 1 2 2 0 1 2 2
D 2 2 1 1 0 1 2 1 2
crosstab
是pivot_table
的专用版本,非常适合这类制表操作。
答案 1 :(得分:2)
dot
d = pd.get_dummies(df)
d.columns = d.columns.str.split('_', expand=True)
c = d.pop('customer')
c.T.dot(d)
fruits veggies grocery
apple banana mango carrot onion tomato brush powder soap
A 3 1 0 2 2 0 2 0 2
B 0 2 2 1 1 2 1 2 1
C 1 1 2 2 0 2 1 1 2
D 2 2 0 1 1 2 1 2 1
bincount
,factorize
i, r = df.customer.factorize()
v = df.drop('customer', 1).values
j, c = pd.factorize(v.ravel())
n, m = len(r), len(c)
b = np.bincount(i.repeat(v.shape[1]) * m + j, minlength=n * m).reshape(n, m)
pd.DataFrame(b, r, c)
apple carrot brush onion soap banana mango tomato powder
A 3 2 2 2 2 1 0 0 0
B 0 1 1 1 1 2 2 2 2
C 1 2 1 0 2 1 2 2 1
D 2 1 1 1 1 2 0 2 2