train_data.iloc[:, DATA_TYPE].loc[:, FEATURES]
这是我的面具示例。
FutureWarning:
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.
col_0 col_1 col_2 col_3 col_4 col_5 col_6 col_7
0 0.791166 0.009661 NaN 0.148213 NaN NaN 0.573262 0.875242
1 0.131313 0.211741 NaN 0.701692 NaN NaN 0.981332 0.854273
2 0.382859 0.489186 NaN 0.461275 NaN NaN 0.290135 0.421597
3 0.871551 0.585270 NaN 0.135620 NaN NaN 0.894486 0.977827
4 0.524309 0.935508 NaN 0.108710 NaN NaN 0.947512 0.226602
我首先获得DATA_TYPE [col_number]设置为true的所有列 然后获取col_name在FEATURES中的所有列
然后我收到一些警告,结果包含Null列
setScale()
进行此操作的正确方法是什么?谢谢!
编辑:DataFrame应首先由DATA_TYPE屏蔽,然后仅在FEATURES中选择名称中的列。
答案 0 :(得分:0)
首先通过索引按DATA_TYPE
过滤列,然后按intersection
获取所有已过滤的列:
np.random.seed(456)
train_data = pd.DataFrame(np.random.rand(5, 10)).add_prefix('col_')
print (train_data)
col_0 col_1 col_2 col_3 col_4 col_5 col_6 \
0 0.248756 0.163067 0.783643 0.808523 0.625628 0.604114 0.885702
1 0.435679 0.385273 0.575710 0.146091 0.686593 0.468804 0.569999
2 0.180917 0.118158 0.242734 0.008183 0.360068 0.146042 0.542723
3 0.213594 0.973156 0.858330 0.533785 0.434459 0.187193 0.288276
4 0.556988 0.942390 0.153546 0.896226 0.178035 0.594263 0.042630
col_7 col_8 col_9
0 0.759117 0.181105 0.150169
1 0.645701 0.723341 0.680671
2 0.857103 0.200212 0.134633
3 0.627167 0.355706 0.729455
4 0.653391 0.366720 0.795570
FEATURES = ['col_0', 'col_1', 'col_2', 'col_3', 'col_4', 'col_5', 'col_6', 'col_7']
DATA_TYPE = [True, True, False, True, False, False, True, True, False, True]
cols = train_data.columns[DATA_TYPE].intersection(FEATURES)
print (cols)
Index(['col_0', 'col_1', 'col_3', 'col_6', 'col_7'], dtype='object')
df = train_data[cols]
print (df)
col_0 col_1 col_3 col_6 col_7
0 0.248756 0.163067 0.808523 0.885702 0.759117
1 0.435679 0.385273 0.146091 0.569999 0.645701
2 0.180917 0.118158 0.008183 0.542723 0.857103
3 0.213594 0.973156 0.533785 0.288276 0.627167
4 0.556988 0.942390 0.896226 0.042630 0.653391