熊猫-将多个分类列旋转到同一组列中

时间:2018-09-21 20:09:12

标签: python pandas dataframe pivot categorical-data

我有一个数据框,如下所示:

import pandas as pd
data = {
    'Num' : ['1','2', '3','4','5','6','7'],
    'col1': ['val1', 'val6', 'val3', 'val7', 'val2','val4','val5'],
    'col2': ['','val3','val5','','','',''],
    'col3': ['','val1','val2','','','','']
}
df = pd.DataFrame(data)
df["myvals"]=1

   Num  col1    col2    col3    myvals
0   1   val1                      1
1   2   val6    val3    val1      1
2   3   val3    val5    val2      1
3   4   val7                      1
4   5   val2                      1
5   6   val4                      1
6   7   val5                      1

我正在尝试将'col1','col2'和'col3'中的值转换为同一组'pivot列',但到目前为止,我只能捕获值来自“ col1”:

pd.pivot_table(df, values="myvals", index=["Num"], columns="col1", fill_value=0)

    col1    val1    val2    val3    val4    val5    val6    val7
    Num                         
    1         1       0       0       0      0        0       0
    2         0       0       0       0      0        1       0
    3         0       0       1       0      0        0       0
    4         0       0       0       0      0        0       1
    5         0       1       0       0      0        0       0
    6         0       0       0       1      0        0       0
    7         0       0       0       0      1        0       0

关于如何也将'col2'和'col3'的值引入下面的任何想法如下所示,其中'Num'= 2和'Num'= 3的行应具有多个1?

col1    val1    val2    val3    val4    val5    val6    val7
Num                         
1         1       0       0       0      0        0       0
2         1       0       1       0      0        1       0
3         0       1       1       0      1        0       0
4         0       0       0       0      0        0       1
5         0       1       0       0      0        0       0
6         0       0       0       1      0        0       0
7         0       0       0       0      1        0       0

1 个答案:

答案 0 :(得分:1)

这更像是get_dummies问题

df.replace('',np.nan).set_index('Num').stack().str.get_dummies().sum(level=0)
Out[1125]: 
     val1  val2  val3  val4  val5  val6  val7
Num                                          
1       1     0     0     0     0     0     0
2       1     0     1     0     0     1     0
3       0     1     1     0     1     0     0
4       0     0     0     0     0     0     1
5       0     1     0     0     0     0     0
6       0     0     0     1     0     0     0
7       0     0     0     0     1     0     0