for col in cols_with_missing:
imputed_X_train_plus[col + '_was_missing'] = imputed_X_train_plus[col].isnull()
imputed_X_test_plus[col + '_was_missing'] = imputed_X_test_plus[col].isnull()
做什么
imputed_X_train_plus [col +'_was_missing']
是平均值?
答案 0 :(得分:3)
我将制作一些数据进行说明。考虑
import numpy as np
import pandas as pd
imputed_X_train_plus = pd.DataFrame({'joe': [3, np.nan],
'nick': [np.nan, 6],
'fred': [1, 7]})
此刻,imputed_X_train_plus
是具有NaN值的数据帧。
joe nick fred
0 3.0 NaN 1
1 NaN 6.0 7
假设您以某种方式知道哪些列缺少值。它们在cols_with_missing
中。
cols_with_missing = ['joe', 'nick']
现在,您要标记出那些缺失的值。你呢
for col in cols_with_missing:
imputed_X_train_plus[col +'_was_missing'] = imputed_X_train_plus[col].isnull()
现在,您拥有imputed_X_train_plus
之类的
joe nick fred joe_was_missing nick_was_missing
0 3.0 NaN 1 False True
1 NaN 6.0 7 True False
最后,col + '_was_missing'
制作了一个新的str
(如joe_was_missing
),用于为imputed_X_train_plus
插入新的列名。