如何使用此“ |”转换数据熊猫推荐系统中的符号

时间:2019-03-06 03:51:38

标签: python python-3.x pandas dataframe

嗨,我有一个庞大的数据集,如下所示

样本数据:-

customerId  products
0            20
1           2|2|23|
0           111|29|11|11|33|11|33
3           164|227
1           2|2

现在我要按如下所示转换此数据集

    customerId  products        purchase_count
     0              20           1
     0              111          1
     0              29           1
     0              11           3
     0              33           2
     1              2            4
     1              23           1
     3              164          1
     3              227          1

请帮助我

3 个答案:

答案 0 :(得分:3)

您可以按以下方式使用:

df.products=df.products.str.split("|")
df_new=pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns).\
                                   replace('',np.nan).dropna()
print(df_new.groupby(['customerId','products'],as_index=False)['products'].\
                   apply(lambda x: x.count()).reset_index(name='purchase_count'))

   customerId products  purchase_count
0           0       11               3
1           0      111               1
2           0       20               1
3           0       29               1
4           0       33               2
5           1        2               4
6           1       23               1
7           3      164               1
8           3      227               1

答案 1 :(得分:3)

这是unnesting问题

df['products']=df.products.str.split('|')
s=unnesting(df,['products'])
s.groupby(s.columns.tolist()).size()
products  customerId
11        0             3
111       0             1
164       3             1
2         1             4
20        0             1
227       3             1
23        1             1
29        0             1
33        0             2
dtype: int64

def unnesting(df, explode):
    idx=df.index.repeat(df[explode[0]].str.len())
    df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)
    df1.index=idx
    return df1.join(df.drop(explode,1),how='left')

答案 2 :(得分:0)

str.splitunnestgroupby.count结合使用:

<nav id="mainNav" class="navbar navbar-default navbar-fixed-top">
  <div class="container">
    <div class="navbar-header">
      <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1">
        <span class="sr-only">Toggle navigation</span> Menu <i class="fa fa-bars"></i>
      </button>
      <a pageScroll class="navbar-brand page-scroll" href="#page-top">ilovou</a>
    </div>
  </div>
</nav>