Question

我有这样一个数据帧的负责人，我想制作一个pivot_table。

    user_id     item_id cate_id action_type action_date
0   11482147    492681  1_11    view          15
1   12070750    457406  1_14    deep_view     15
2   12431632    527476  1_1     view          15
3   13397746    531771  1_6     deep_view     15
4   13794253    510089  1_27    deep_view     15

有20000+ user_id和37 cate_id，5 action_type。我想创建一个这样的pivot_table，我用excel来做。表中的值应该是每个user_id的value_count和每个cate_id。 pivot_table 我尝试了以下代码。

user_cate_table = pd.pivot_table(user_cate_table2,index = ['user_id','cate_id'],columns=np.unique(train['action_type']),values='action_type',aggfunc=np.count_nonzero,fill_value=0)

我收到了这条消息。

ValueError: Grouper and axis must be same length

数据框头部user_cate_table2。

    user_id     item_id cate_id action_type
0   11482147    492681  1_11    1.0
1   12070750    457406  1_14    2.0
2   12431632    527476  1_1     1.0
3   13397746    531771  1_6     2.0
4   13794253    510089  1_27    2.0
5   14378544    535335  1_6     2.0
6   1705634     535202  1_10    1.0
7   6943823     478183  1_3     2.0
8   5902475     524378  1_6     1.0

Answer 1

我认为您需要groupby + size + unstack：

df1 = df.groupby(['user_id','cate_id', 'action_type']).size().unstack(fill_value=0)
print (df1)
action_type       deep_view  view
user_id  cate_id                 
11482147 1_11             0     1
12070750 1_14             1     0
12431632 1_1              0     1
13397746 1_6              1     0
13794253 1_27             1     0

pivot_table的另一个解决方案：

df1 = df.pivot_table(index=['user_id','cate_id'], 
                     columns='action_type', 
                     values='item_id', 
                     aggfunc=len, 
                     fill_value=0)
print (df1)
action_type       deep_view  view
user_id  cate_id                 
11482147 1_11             0     1
12070750 1_14             1     0
12431632 1_1              0     1
13397746 1_6              1     0
13794253 1_27             1     0

Answer 2

您不需要使用pivot_table。您可以使用groupby和unstack

df.groupby(['user_id', 'cate_id', 'action_type'])['action_date'].agg(np.count_nonzero).unstack('action_type')

pivot_table也可以，但不会误解columns=参数

pd.pivot_table(df,index = ['user_id','cate_id'],columns=['action_type'],aggfunc=np.count_nonzero,fill_value=0)

使用pandas制作pivot_table但发生错误

2 个答案: