列表的差异大小使大熊猫中的Dataframe以及如何制作假人

时间:2017-03-31 04:46:31

标签: python pandas

  index            0          1         2         3         4        5  \
0         0       Action  Adventure   Fantasy    Sci-Fi       NaN      NaN   
1         1       Action  Adventure   Fantasy       NaN       NaN      NaN   
2         2       Action  Adventure  Thriller       NaN       NaN      NaN   
3         3       Action   Thriller       NaN       NaN       NaN      NaN   
4         4  Documentary        NaN       NaN       NaN       NaN      NaN   
5         5       Action  Adventure    Sci-Fi       NaN       NaN      NaN   
6         6       Action  Adventure   Romance       NaN       NaN      NaN   
7         7    Adventure  Animation    Comedy    Family   Fantasy  Musical   
8         8       Action  Adventure    Sci-Fi       NaN       NaN      NaN   
9         9    Adventure     Family   Fantasy   Mystery       NaN      NaN

我有这样的数据......

但我不知道在pandas python中制作数据帧数据大小不同的假人....

Action  Adventure   Fantasy Sci-Fi  Thriller    Ducumentary Romance Animation   Comedy  family  Fantasy Musical Mystery
0   1   1   1   1   0   0   0   0   0   0   0   0   0
1   1   1   1   0   0   0   0   0   0   0   0   0   0
2   1   1   0   0   1   0   0   0   0   0   0   0   0
3   1   0   0   0   1   0   0   0   0   0   0   0   0
4   0   0   0   0   0   1   0   0   0   0   0   0   0
5   1   1   0   1   0   0   0   0   0   0   0   0   0
6   1   1   0   0   0   0   1   0   0   0   0   0   0
7   0   1   0   0   0   0   0   1   1   1   1   1   0
8   1   1   0   1   0   0   0   0   0   0   0   0   0
9   0   1   0   0   0   0   0   0   0   1   1   0   1
像这样....

2 个答案:

答案 0 :(得分:3)

我认为您可以使用get_dummies,但首先需要按dropiloc删除第一列,然后按stack创建Series

输出有重复索引,因此groupby需要index并汇总max

df = pd.get_dummies(df.drop('index', axis=1).stack()).groupby(level=0).max()
#alternative solution
#df = pd.get_dummies(df.iloc[:, 1:].stack()).groupby(level=0).max()
print (df)
   Action  Adventure  Animation  Comedy  Documentary  Family  Fantasy  \
0       1          1          0       0            0       0        1   
1       1          1          0       0            0       0        1   
2       1          1          0       0            0       0        0   
3       1          0          0       0            0       0        0   
4       0          0          0       0            1       0        0   
5       1          1          0       0            0       0        0   
6       1          1          0       0            0       0        0   
7       0          1          1       1            0       1        1   
8       1          1          0       0            0       0        0   
9       0          1          0       0            0       1        1   

   Musical  Mystery  Romance  Sci-Fi  Thriller  
0        0        0        0       1         0  
1        0        0        0       0         0  
2        0        0        0       0         1  
3        0        0        0       0         1  
4        0        0        0       0         0  
5        0        0        0       1         0  
6        0        0        1       0         0  
7        1        0        0       0         0  
8        0        0        0       1         0  
9        0        1        0       0         0  

答案 1 :(得分:1)

保管:
确保令人讨厌的index列不受影响

df = df.drop('index', 1)

使用pd.values_counts

df.stack().groupby(level=0).value_counts().unstack(fill_value=0)

enter image description here

或等效

df.apply(pd.value_counts, 1).fillna(0).astype(int)