Can someone help me with binary encoding of a data that looks like the following example going from here:
df = pd.DataFrame({'_id': [1,2,3],
'test': ['one,two,three', 'one,two', 'two']})
print(df)
_id test
0 1 one,two,three
1 2 one,two
2 3 two
to here:
df_result = pd.DataFrame({'id': [1,2,3],
'one': [1,1,0],
'two': [1,1,1],
'three': [1,0,0]})
print(df_result)
id one three two
0 1 1 1 1
1 2 1 0 1
2 3 0 0 1
Any help would be very appreciated! Thanks
答案 0 :(得分:5)
Use str.get_dummies()
In [58]: df.test.str.get_dummies(',')
Out[58]:
one three two
0 1 1 1
1 1 0 1
2 0 0 1
Use join
the result to original if needed.
In [62]: df.join(df.test.str.get_dummies(','))
Out[62]:
_id test one three two
0 1 one,two,three 1 1 1
1 2 one,two 1 0 1
2 3 two 0 0 1
Or, pd.concat
In [63]: pd.concat([df, df.test.str.get_dummies(',')], axis=1)
Out[63]:
_id test one three two
0 1 one,two,three 1 1 1
1 2 one,two 1 0 1
2 3 two 0 0 1