python分组和进一步分组

时间:2019-03-13 11:11:56

标签: python python-3.x pandas pandas-groupby

我尝试了不同的方式将数据分为两个不同的列和明亮的权重因子。可悲的是,我对python还是很陌生。我已经解决了几个问题,并试图提出一半的解决方案。您能否为我提供其余或至少一个想法? 下面是模拟代码:

    data = pd.DataFrame({'sku_id' : ['s1', 's1', 's1', 's2','s2','s2','s3','s3','s3'], 
             'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']})
count_series = data.groupby(['product_id','sku_id']).size()
print('-'*30)
print(count_series)
print('-'*30)
agg_count = count_series.to_frame(name = 'weight').reset_index()
print(agg_count)
print('-'*30)

输出为:

------------------------------
product_id  sku_id
p1          s1        2
            s2        3
p2          s1        1
            s3        2
p3          s3        1
dtype: int64
------------------------------
  product_id sku_id  weight
0         p1     s1       2
1         p1     s2       3
2         p2     s1       1
3         p2     s3       2
4         p3     s3       1
------------------------------

有人可以帮助我根据他们的组合和发生情况对 SKU_ID列 进行进一步分组。 (它类似于推荐引擎)

所需的输出:

-----------------------
    sku_id    weight
    s1 & s2     1
    s2 & s3     0
    s3 & s1     1
-----------------------

1 个答案:

答案 0 :(得分:2)

IIUC,您可以尝试以下操作:

import itertools
#Replicating your steps:
m = data.groupby(['product_id','sku_id']).size().reset_index(name='weight')
#group on `product_id` and apply a `tuple on `sku_id` print to see results
n=m.groupby('product_id')['sku_id'].apply(tuple).reset_index()
#create combinations by list(itertools.combinations(m.sku_id.unique(),2))
#check if any combination matches tuple and apply astype(int) for int results
n['new']=n.sku_id.isin((itertools.combinations(m.sku_id.unique(),2))).astype(int)
print(n)

  product_id    sku_id  new
0         p1  (s1, s2)    1
1         p2  (s1, s3)    1
2         p3     (s3,)    0

请注意,s2列中包含s3sku_id。因此,仅考虑该行将始终为您提供组合,所以我的输出会有所不同。