向数据框添加缺少的功能

时间:2019-01-28 14:23:22

标签: python pandas dataframe

我有2个数据帧TRAINTEST。我想通过添加以下信息来更改TRAIN:该信息不包含TEST中但不包含TRAIN中的所有项目(Y2,Y3)。

TRAIN = pd.DataFrame({'X' : [1,1,1,1,1,2,2,2,2,2],
                      'Y1': [1,1,1,1,1,1,0,0,0,0],
                      'Y4': [1,1,0,0,0,0,0,0,0,0]})

TEST  = pd.DataFrame({'X' : [1,1,1,1,1,2,2,2,2],
                      'Y1': [1,1,0,1,0,1,0,0,0],
                      'Y2': [1,0,1,0,1,0,1,0,1],
                      'Y3': [1,1,0,1,1,0,0,0,0],
                      'Y4': [1,1,0,1,1,0,0,0,0]})

我想要:

TRAIN = pd.DataFrame({'X' : [1,1,1,1,1,2,2,2,2,2],
                      'Y1': [1,1,1,1,1,1,0,0,0,0],
                      'Y4': [1,1,1,1,1,1,0,0,0,0],
                      'Y2': [0,0,0,0,0,0,0,0,0,0],
                      'Y3': [0,0,0,0,0,0,0,0,0,0]})

我尝试过:

L_TRAIN = list(TRAIN)
L_TEST  = list(TEST)

def Diff(li1, li2): 
    li_dif = [i for i in li1 + li2 if i not in li1] 
    return li_dif

L_DIFF  = Diff(L_TRAIN, L_TEST)

TRAIN[L_DIFF] = 0

但是得到了:

KeyError: "['Y2' 'Y3'] not in index"

1 个答案:

答案 0 :(得分:2)

pandas不支持将值分配给多列,因此您需要一个一个地遍历它:

import pandas as pd 

TRAIN = pd.DataFrame({'X' : [1,1,1,1,1,2,2,2,2,2],
                      'Y1': [1,1,1,1,1,1,0,0,0,0],
                      'Y4': [1,1,0,0,0,0,0,0,0,0]})

TEST  = pd.DataFrame({'X' : [1,1,1,1,1,2,2,2,2],
                      'Y1': [1,1,0,1,0,1,0,0,0],
                      'Y2': [1,0,1,0,1,0,1,0,1],
                      'Y3': [1,1,0,1,1,0,0,0,0],
                      'Y4': [1,1,0,1,1,0,0,0,0]})


diff_cols = set(TEST.columns)-set(TRAIN.columns)

for i in diff_cols:
    TRAIN[i] = 0

print(TRAIN)

输出:

   X  Y1  Y4  Y2  Y3                                                                                                                  
0  1   1   1   0   0                                                                                                                  
1  1   1   1   0   0                                                                                                                  
2  1   1   0   0   0                                                                                                                  
3  1   1   0   0   0                                                                                                                  
4  1   1   0   0   0                                                                                                                  
5  2   1   0   0   0                                                                                                                  
6  2   0   0   0   0