熊猫基于一个值在多列中分配值

时间:2019-08-02 06:00:27

标签: python-3.x pandas

我有一个像这样的数据集,

sample = {'Theme': ['never give a ten','interaction speed','no feedback,premium'],
        'cat1': [0,0,0],
        'cat2': [0,0,0],
        'cat3': [0,0,0],
        'cat4': [0,0,0]
        }

pd.DataFrame(sample,columns = ['Theme','cat1','cat2','cat3','cat4'])


              Theme   cat1 cat2 cat3 cat4
0   never give a ten    0   0   0   0
1   interaction speed   0   0   0   0
2   no feedback,premium 0   0   0   0

现在,我需要根据主题中的值替换cat列中的值。如果“主题”列为“从不给十”,则将cat1更改为1,类似地,如果主题列具有“交互速度”,则将cat2更改为1,如果主题列中没有“反馈”,则更改为“ cat3” '设为1,而'premium'则将cat4更改为1。

在此示例中,我提供了4个类别,总共共有21个类别。我可以对21个类别的字符串输入21次单词,但是我正在寻找一种有效的方法来将其写入函数,循环每一行并遍历逻辑并更新相应的列,有人可以帮忙吗?

谢谢。

1 个答案:

答案 0 :(得分:1)

这里可以用Series.str.get_dummies按类别设置列名称-列名称被排序:

df1 = df['Theme'].str.get_dummies(',')
print (df1)
   interaction speed  never give a ten  no feedback  premium
0                  0                 1            0        0
1                  1                 0            0        0
2                  0                 0            1        1

如果需要在输出中添加第一列,请添加DataFrame.join

df11 = df[['Theme']].join(df['Theme'].str.get_dummies(','))
print (df11)
                 Theme  interaction speed  never give a ten  no feedback  \
0     never give a ten                  0                 1            0   
1    interaction speed                  1                 0            0   
2  no feedback,premium                  0                 0            1   

   premium  
0        0  
1        0  
2        1  

如果列的顺序很重要,请添加DataFrame.reindex

#removed posible duplicates with remain ordering
cols = dict.fromkeys([y for x in df['Theme'] for y in x.split(',')]).keys()
df2 = df['Theme'].str.get_dummies(',').reindex(cols, axis=1)
print (df2)
   never give a ten  interaction speed  no feedback  premium
0                 1                  0            0        0
1                 0                  1            0        0
2                 0                  0            1        1


cols = dict.fromkeys([y for x in df['Theme'] for y in x.split(',')]).keys()
df2 = df[['Theme']].join(df['Theme'].str.get_dummies(',').reindex(cols, axis=1))
print (df2)
                 Theme  never give a ten  interaction speed  no feedback  \
0     never give a ten                 1                  0            0   
1    interaction speed                 0                  1            0   
2  no feedback,premium                 0                  0            1   

   premium  
0        0  
1        0  
2        1