我有一个像这样的数据集,
sample = {'Theme': ['never give a ten','interaction speed','no feedback,premium'],
'cat1': [0,0,0],
'cat2': [0,0,0],
'cat3': [0,0,0],
'cat4': [0,0,0]
}
pd.DataFrame(sample,columns = ['Theme','cat1','cat2','cat3','cat4'])
Theme cat1 cat2 cat3 cat4
0 never give a ten 0 0 0 0
1 interaction speed 0 0 0 0
2 no feedback,premium 0 0 0 0
现在,我需要根据主题中的值替换cat列中的值。如果“主题”列为“从不给十”,则将cat1更改为1,类似地,如果主题列具有“交互速度”,则将cat2更改为1,如果主题列中没有“反馈”,则更改为“ cat3” '设为1,而'premium'则将cat4更改为1。
在此示例中,我提供了4个类别,总共共有21个类别。我可以对21个类别的字符串输入21次单词,但是我正在寻找一种有效的方法来将其写入函数,循环每一行并遍历逻辑并更新相应的列,有人可以帮忙吗?
谢谢。
答案 0 :(得分:1)
这里可以用Series.str.get_dummies
按类别设置列名称-列名称被排序:
df1 = df['Theme'].str.get_dummies(',')
print (df1)
interaction speed never give a ten no feedback premium
0 0 1 0 0
1 1 0 0 0
2 0 0 1 1
如果需要在输出中添加第一列,请添加DataFrame.join
:
df11 = df[['Theme']].join(df['Theme'].str.get_dummies(','))
print (df11)
Theme interaction speed never give a ten no feedback \
0 never give a ten 0 1 0
1 interaction speed 1 0 0
2 no feedback,premium 0 0 1
premium
0 0
1 0
2 1
如果列的顺序很重要,请添加DataFrame.reindex
:
#removed posible duplicates with remain ordering
cols = dict.fromkeys([y for x in df['Theme'] for y in x.split(',')]).keys()
df2 = df['Theme'].str.get_dummies(',').reindex(cols, axis=1)
print (df2)
never give a ten interaction speed no feedback premium
0 1 0 0 0
1 0 1 0 0
2 0 0 1 1
cols = dict.fromkeys([y for x in df['Theme'] for y in x.split(',')]).keys()
df2 = df[['Theme']].join(df['Theme'].str.get_dummies(',').reindex(cols, axis=1))
print (df2)
Theme never give a ten interaction speed no feedback \
0 never give a ten 1 0 0
1 interaction speed 0 1 0
2 no feedback,premium 0 0 1
premium
0 0
1 0
2 1