Question

我有一个这样的项目列表：

lgenre[8:15]

[['Action'],
 ['Action', 'Adventure', 'Thriller'],
 ['Comedy', 'Drama', 'Romance'],
 ['Comedy', 'Horror'],
 ['Animation', "Children's"],
 ['Drama'],
 ['Action', 'Adventure', 'Romance']]

我想要的是：

    id  Action  Adventure   Thriller    Comedy  Drama   Romance Horror  Animation   Children's
0   0   1   0   0   0   0   0   0   0   0
1   1   1   1   1   0   0   0   0   0   0
2   2   0   0   0   1   1   1   0   0   0
3   3   0   0   0   1   0   0   1   0   0
4   4   0   0   0   0   0   0   0   1   1
5   5   0   0   0   0   1   0   0   0   0
6   6   1   1   0   0   0   1   0   0   0

我试图写一个看起来像这样的双循环：

stor=pd.DataFrame({'id':list(range(len(lgenre[8:15])))})
for num,list in enumerate(lgenre[8:15]):
    for item in list:
        try:
            stor[item][num]=1
        except:
            stor[item]=0
            stor[item][num]=1

尽管它是可编译的，但实现起来太慢。有什么有效的方法可以做这种事情吗？还有更好的算法或内置方法吗？

Answer 1

从嵌套列表构建数据框，然后使用pd.get_dummies：

df = pd.get_dummies(pd.DataFrame(l))
df.columns = df.columns.str.split("_").str[-1]

     Action  Animation  Comedy  Drama  Adventure  Children's  Drama  Horror  \
0       1          0       0      0          0           0      0       0   
1       1          0       0      0          1           0      0       0   
2       0          0       1      0          0           0      1       0   
3       0          0       1      0          0           0      0       1   
4       0          1       0      0          0           1      0       0   
5       0          0       0      1          0           0      0       0   
6       1          0       0      0          1           0      0       0   

   Romance  Thriller  
0        0         0  
1        0         1  
2        1         0  
3        0         0  
4        0         0  
5        0         0  
6        1         0

将物品清单转换为熊猫中的假人

1 个答案: