例如,我有一个数据框:
df
category name
0 [['Clothing & Jewelry', 'Shoes']] Jason
1 [['Clothing & Jewelry', 'Jewelry']] Molly
如何使用逗号存储category
列的字符串来分隔条目?
我希望得到的结果:
category name
0 Clothing & Jewelry, Shoes Jason
1 Clothing & Jewelry, Jewelry Molly
答案 0 :(得分:0)
您可以使用apply
:
lambda
In [21]:
df['category'].apply(lambda x: x.remove('Clothing & Jewelry'))
df
Out[21]:
category name
0 [Shoes] Jason
1 [Jewelry] Molly
请注意,在系列中存储非标量值是有问题的,因为过滤和矢量化操作不起作用,最好使用逗号存储字符串以分隔条目
修改强>
要回答您更新的问题,我会将数据元素存储在不同的行中,因为这样可以简化过滤:
In [79]:
df['category'].apply(lambda x: ','.join(x[0])).str.split(',',expand=True).stack().reset_index().drop('level_1', axis=1)
Out[79]:
level_0 0
0 0 Clothing & Jewelry
1 0 Shoes
2 1 Clothing & Jewelry
3 1 Jewelry
然后我们可以merge
回到原来的df然后我们可以过滤:
In[80]:
df.merge(df['category'].apply(lambda x: ','.join(x[0])).str.split(',',expand=True).stack().reset_index().drop('level_1', axis=1), left_index=True, right_on='level_0', how='left')
Out[80]:
category name level_0 0
0 [[Clothing & Jewelry, Shoes]] Jason 0 Clothing & Jewelry
1 [[Clothing & Jewelry, Shoes]] Jason 0 Shoes
2 [[Clothing & Jewelry, Jewelry]] Molly 1 Clothing & Jewelry
3 [[Clothing & Jewelry, Jewelry]] Molly 1 Jewelry
In [82]:
df = df.drop('level_0', axis=1)
df
Out[82]:
category name 0
0 [[Clothing & Jewelry, Shoes]] Jason Clothing & Jewelry
1 [[Clothing & Jewelry, Shoes]] Jason Shoes
2 [[Clothing & Jewelry, Jewelry]] Molly Clothing & Jewelry
3 [[Clothing & Jewelry, Jewelry]] Molly Jewelry
In [84]:
df.rename(columns={0:'category_values'},inplace=True)
df
Out[84]:
category name category_values
0 [[Clothing & Jewelry, Shoes]] Jason Clothing & Jewelry
1 [[Clothing & Jewelry, Shoes]] Jason Shoes
2 [[Clothing & Jewelry, Jewelry]] Molly Clothing & Jewelry
3 [[Clothing & Jewelry, Jewelry]] Molly Jewelry
In [85]:
df[df['category_values']!='Clothing & Jewelry']
Out[85]:
category name category_values
1 [[Clothing & Jewelry, Shoes]] Jason Shoes
3 [[Clothing & Jewelry, Jewelry]] Molly Jewelry