有人可以引导我完成这段代码吗?

时间:2018-09-09 06:08:12

标签: python pandas

for col in cols_with_missing:
    imputed_X_train_plus[col + '_was_missing'] = imputed_X_train_plus[col].isnull()
    imputed_X_test_plus[col + '_was_missing'] = imputed_X_test_plus[col].isnull()

做什么

  

imputed_X_train_plus [col +'_was_missing']

是平均值?

1 个答案:

答案 0 :(得分:3)

我将制作一些数据进行说明。考虑

import numpy as np
import pandas as pd

imputed_X_train_plus = pd.DataFrame({'joe': [3, np.nan],
                                     'nick': [np.nan, 6],
                                     'fred': [1, 7]})

此刻,imputed_X_train_plus是具有NaN值的数据帧。

      joe   nick  fred
   0  3.0   NaN   1
   1  NaN   6.0   7

假设您以某种方式知道哪些列缺少值。它们在cols_with_missing中。

cols_with_missing = ['joe', 'nick']

现在,您要标记出那些缺失的值。你呢

for col in cols_with_missing:
    imputed_X_train_plus[col +'_was_missing'] = imputed_X_train_plus[col].isnull()

现在,您拥有imputed_X_train_plus之类的

   joe  nick  fred  joe_was_missing  nick_was_missing
0  3.0   NaN     1            False              True
1  NaN   6.0     7             True             False

最后,col + '_was_missing'制作了一个新的str(如joe_was_missing),用于为imputed_X_train_plus插入新的列名。