解决python

时间:2017-09-15 17:49:34

标签: python pandas dummy-variable

我有

 a={0: ['I3925'], 1: ['I3925'], 2: ['I3925'], 3: ['I2355'], 4: ['I2355'], 5: ['I2355'], 6: ['I111'], 7: ['I111'], 8: ['I111'], 9: ['I405'], 10: ['I405'], 11: ['I3878', 'I2864'], 12: ['I3878'], 13: ['I534'], 14: ['I534'], 15: ['I134', 'I2276'], 16: ['I107'], 17: ['I107'], 18: ['I2864']}  

包含一个密钥的一个补充I编号。

b = pd.Series(a,\
                              index = a.keys(),
                               name = "a")
pd.get_dummies(b.apply(pd.Series))

然后get_dummies无效,因为它创建了一个重复的列1_15来存储与第二个I号码的匹配,而不是将它们堆叠到同一列中。我不明白为什么。

    0_I107  0_I111  0_I134  0_I2355 0_I2864 0_I3878 0_I3925 0_I405  0_I534  1_I2276 1_I2864
0   0   0   0   0   0   0   1   0   0   0   0
1   0   0   0   0   0   0   1   0   0   0   0
2   0   0   0   0   0   0   1   0   0   0   0
3   0   0   0   1   0   0   0   0   0   0   0
4   0   0   0   1   0   0   0   0   0   0   0
5   0   0   0   1   0   0   0   0   0   0   0
6   0   1   0   0   0   0   0   0   0   0   0
7   0   1   0   0   0   0   0   0   0   0   0
8   0   1   0   0   0   0   0   0   0   0   0
9   0   0   0   0   0   0   0   1   0   0   0
10  0   0   0   0   0   0   0   1   0   0   0
11  0   0   0   0   0   1   0   0   0   0   1
12  0   0   0   0   0   1   0   0   0   0   0
13  0   0   0   0   0   0   0   0   1   0   0
14  0   0   0   0   0   0   0   0   1   0   0
15  0   0   1   0   0   0   0   0   0   1   0
16  1   0   0   0   0   0   0   0   0   0   0
17  1   0   0   0   0   0   0   0   0   0   0
18  0   0   0   0   1   0   0   0   0   0   0

有人可以解释一下我做错了吗?

2 个答案:

答案 0 :(得分:3)

选项1

from sklearn.preprocessing import MultiLabelBinarizer

mlb = MultiLabelBinarizer()
pd.DataFrame(mlb.fit_transform(b), b.index, mlb.classes_)

    I107  I111  I134  I2276  I2355  I2864  I3878  I3925  I405  I534
0      0     0     0      0      0      0      0      1     0     0
1      0     0     0      0      0      0      0      1     0     0
2      0     0     0      0      0      0      0      1     0     0
3      0     0     0      0      1      0      0      0     0     0
4      0     0     0      0      1      0      0      0     0     0
5      0     0     0      0      1      0      0      0     0     0
6      0     1     0      0      0      0      0      0     0     0
7      0     1     0      0      0      0      0      0     0     0
8      0     1     0      0      0      0      0      0     0     0
9      0     0     0      0      0      0      0      0     1     0
10     0     0     0      0      0      0      0      0     1     0
11     0     0     0      0      0      1      1      0     0     0
12     0     0     0      0      0      0      1      0     0     0
13     0     0     0      0      0      0      0      0     0     1
14     0     0     0      0      0      0      0      0     0     1
15     0     0     1      1      0      0      0      0     0     0
16     1     0     0      0      0      0      0      0     0     0
17     1     0     0      0      0      0      0      0     0     0
18     0     0     0      0      0      1      0      0     0     0

选项2

b.str.join('|').str.get_dummies()

    I107  I111  I134  I2276  I2355  I2864  I3878  I3925  I405  I534
0      0     0     0      0      0      0      0      1     0     0
1      0     0     0      0      0      0      0      1     0     0
2      0     0     0      0      0      0      0      1     0     0
3      0     0     0      0      1      0      0      0     0     0
4      0     0     0      0      1      0      0      0     0     0
5      0     0     0      0      1      0      0      0     0     0
6      0     1     0      0      0      0      0      0     0     0
7      0     1     0      0      0      0      0      0     0     0
8      0     1     0      0      0      0      0      0     0     0
9      0     0     0      0      0      0      0      0     1     0
10     0     0     0      0      0      0      0      0     1     0
11     0     0     0      0      0      1      1      0     0     0
12     0     0     0      0      0      0      1      0     0     0
13     0     0     0      0      0      0      0      0     0     1
14     0     0     0      0      0      0      0      0     0     1
15     0     0     1      1      0      0      0      0     0     0
16     1     0     0      0      0      0      0      0     0     0
17     1     0     0      0      0      0      0      0     0     0
18     0     0     0      0      0      1      0      0     0     0

答案 1 :(得分:2)

这样的东西?

pd.get_dummies(b.apply(pd.Series).stack()).sum(level=0)

输出:

    I107  I111  I134  I2276  I2355  I2864  I3878  I3925  I405  I534
0      0     0     0      0      0      0      0      1     0     0
1      0     0     0      0      0      0      0      1     0     0
2      0     0     0      0      0      0      0      1     0     0
3      0     0     0      0      1      0      0      0     0     0
4      0     0     0      0      1      0      0      0     0     0
5      0     0     0      0      1      0      0      0     0     0
6      0     1     0      0      0      0      0      0     0     0
7      0     1     0      0      0      0      0      0     0     0
8      0     1     0      0      0      0      0      0     0     0
9      0     0     0      0      0      0      0      0     1     0
10     0     0     0      0      0      0      0      0     1     0
11     0     0     0      0      0      1      1      0     0     0
12     0     0     0      0      0      0      1      0     0     0
13     0     0     0      0      0      0      0      0     0     1
14     0     0     0      0      0      0      0      0     0     1
15     0     0     1      1      0      0      0      0     0     0
16     1     0     0      0      0      0      0      0     0     0
17     1     0     0      0      0      0      0      0     0     0
18     0     0     0      0      0      1      0      0     0     0