将列值更改为字符串

时间:2016-01-31 16:04:02

标签: python-2.7 pandas

例如,我有一个数据框:

df

    category                              name
0   [['Clothing & Jewelry', 'Shoes']]     Jason
1   [['Clothing & Jewelry', 'Jewelry']]   Molly

如何使用逗号存储category列的字符串来分隔条目?

我希望得到的结果:

    category                              name
0   Clothing & Jewelry, Shoes             Jason
1   Clothing & Jewelry, Jewelry           Molly

1 个答案:

答案 0 :(得分:0)

您可以使用apply

致电lambda
In [21]:
df['category'].apply(lambda x: x.remove('Clothing & Jewelry'))
df

Out[21]:
    category   name
0    [Shoes]  Jason
1  [Jewelry]  Molly

请注意,在系列中存储非标量值是有问题的,因为过滤和矢量化操作不起作用,最好使用逗号存储字符串以分隔条目

修改

要回答您更新的问题,我会将数据元素存储在不同的行中,因为这样可以简化过滤:

In [79]:
df['category'].apply(lambda x: ','.join(x[0])).str.split(',',expand=True).stack().reset_index().drop('level_1', axis=1)

Out[79]:
   level_0                   0
0        0  Clothing & Jewelry
1        0               Shoes
2        1  Clothing & Jewelry
3        1             Jewelry

然后我们可以merge回到原来的df然后我们可以过滤:

In[80]:
df.merge(df['category'].apply(lambda x: ','.join(x[0])).str.split(',',expand=True).stack().reset_index().drop('level_1', axis=1), left_index=True, right_on='level_0', how='left')

Out[80]:
                          category   name  level_0                   0
0    [[Clothing & Jewelry, Shoes]]  Jason        0  Clothing & Jewelry
1    [[Clothing & Jewelry, Shoes]]  Jason        0               Shoes
2  [[Clothing & Jewelry, Jewelry]]  Molly        1  Clothing & Jewelry
3  [[Clothing & Jewelry, Jewelry]]  Molly        1             Jewelry

In [82]:
df = df.drop('level_0', axis=1)
df

Out[82]:
                          category   name                   0
0    [[Clothing & Jewelry, Shoes]]  Jason  Clothing & Jewelry
1    [[Clothing & Jewelry, Shoes]]  Jason               Shoes
2  [[Clothing & Jewelry, Jewelry]]  Molly  Clothing & Jewelry
3  [[Clothing & Jewelry, Jewelry]]  Molly             Jewelry

In [84]:    
df.rename(columns={0:'category_values'},inplace=True)
df

Out[84]:
                          category   name     category_values
0    [[Clothing & Jewelry, Shoes]]  Jason  Clothing & Jewelry
1    [[Clothing & Jewelry, Shoes]]  Jason               Shoes
2  [[Clothing & Jewelry, Jewelry]]  Molly  Clothing & Jewelry
3  [[Clothing & Jewelry, Jewelry]]  Molly             Jewelry

In [85]:
df[df['category_values']!='Clothing & Jewelry']

Out[85]:
                          category   name category_values
1    [[Clothing & Jewelry, Shoes]]  Jason           Shoes
3  [[Clothing & Jewelry, Jewelry]]  Molly         Jewelry