python - binary encoding of comma separated string column

时间:2017-08-04 12:34:41

标签: python pandas data-manipulation

Can someone help me with binary encoding of a data that looks like the following example going from here:

df = pd.DataFrame({'_id': [1,2,3],
                   'test': ['one,two,three', 'one,two', 'two']})

print(df)

   _id           test
0    1  one,two,three
1    2        one,two
2    3            two

to here:

df_result = pd.DataFrame({'id': [1,2,3],
                          'one': [1,1,0],
                          'two': [1,1,1],
                          'three': [1,0,0]})
print(df_result)

   id  one  three  two
0   1    1      1    1
1   2    1      0    1
2   3    0      0    1

Any help would be very appreciated! Thanks

1 个答案:

答案 0 :(得分:5)

Use str.get_dummies()

In [58]: df.test.str.get_dummies(',')
Out[58]:
   one  three  two
0    1      1    1
1    1      0    1
2    0      0    1

Use join the result to original if needed.

In [62]: df.join(df.test.str.get_dummies(','))
Out[62]:
   _id           test  one  three  two
0    1  one,two,three    1      1    1
1    2        one,two    1      0    1
2    3            two    0      0    1

Or, pd.concat

In [63]: pd.concat([df, df.test.str.get_dummies(',')], axis=1)
Out[63]:
   _id           test  one  three  two
0    1  one,two,three    1      1    1
1    2        one,two    1      0    1
2    3            two    0      0    1