嗨,我有一个庞大的数据集,如下所示
样本数据:-
customerId products
0 20
1 2|2|23|
0 111|29|11|11|33|11|33
3 164|227
1 2|2
现在我要按如下所示转换此数据集
customerId products purchase_count
0 20 1
0 111 1
0 29 1
0 11 3
0 33 2
1 2 4
1 23 1
3 164 1
3 227 1
请帮助我
答案 0 :(得分:3)
您可以按以下方式使用:
df.products=df.products.str.split("|")
df_new=pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns).\
replace('',np.nan).dropna()
print(df_new.groupby(['customerId','products'],as_index=False)['products'].\
apply(lambda x: x.count()).reset_index(name='purchase_count'))
customerId products purchase_count
0 0 11 3
1 0 111 1
2 0 20 1
3 0 29 1
4 0 33 2
5 1 2 4
6 1 23 1
7 3 164 1
8 3 227 1
答案 1 :(得分:3)
这是unnesting问题
df['products']=df.products.str.split('|')
s=unnesting(df,['products'])
s.groupby(s.columns.tolist()).size()
products customerId
11 0 3
111 0 1
164 3 1
2 1 4
20 0 1
227 3 1
23 1 1
29 0 1
33 0 2
dtype: int64
def unnesting(df, explode):
idx=df.index.repeat(df[explode[0]].str.len())
df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)
df1.index=idx
return df1.join(df.drop(explode,1),how='left')
答案 2 :(得分:0)
将str.split
与unnest
和groupby.count
结合使用:
<nav id="mainNav" class="navbar navbar-default navbar-fixed-top">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1">
<span class="sr-only">Toggle navigation</span> Menu <i class="fa fa-bars"></i>
</button>
<a pageScroll class="navbar-brand page-scroll" href="#page-top">ilovou</a>
</div>
</div>
</nav>