按索引和名称同时提取列

时间:2018-04-12 07:04:35

标签: python-3.x pandas numpy

train_data.iloc[:, DATA_TYPE].loc[:, FEATURES]

这是我的面具示例。

FutureWarning: 
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

    col_0   col_1   col_2   col_3   col_4   col_5   col_6   col_7
0   0.791166    0.009661    NaN 0.148213    NaN NaN 0.573262    0.875242
1   0.131313    0.211741    NaN 0.701692    NaN NaN 0.981332    0.854273
2   0.382859    0.489186    NaN 0.461275    NaN NaN 0.290135    0.421597
3   0.871551    0.585270    NaN 0.135620    NaN NaN 0.894486    0.977827
4   0.524309    0.935508    NaN 0.108710    NaN NaN 0.947512    0.226602

我首先获得DATA_TYPE [col_number]设置为true的所有列 然后获取col_name在FEATURES中的所有列

然后我收到一些警告,结果包含Null列

setScale()

进行此操作的正确方法是什么?谢谢!

编辑:DataFrame应首先由DATA_TYPE屏蔽,然后仅在FEATURES中选择名称中的列。

1 个答案:

答案 0 :(得分:0)

首先通过索引按DATA_TYPE过滤列,然后按intersection获取所有已过滤的列:

np.random.seed(456)

train_data = pd.DataFrame(np.random.rand(5, 10)).add_prefix('col_')
print (train_data)
      col_0     col_1     col_2     col_3     col_4     col_5     col_6  \
0  0.248756  0.163067  0.783643  0.808523  0.625628  0.604114  0.885702   
1  0.435679  0.385273  0.575710  0.146091  0.686593  0.468804  0.569999   
2  0.180917  0.118158  0.242734  0.008183  0.360068  0.146042  0.542723   
3  0.213594  0.973156  0.858330  0.533785  0.434459  0.187193  0.288276   
4  0.556988  0.942390  0.153546  0.896226  0.178035  0.594263  0.042630   

      col_7     col_8     col_9  
0  0.759117  0.181105  0.150169  
1  0.645701  0.723341  0.680671  
2  0.857103  0.200212  0.134633  
3  0.627167  0.355706  0.729455  
4  0.653391  0.366720  0.795570  

FEATURES = ['col_0', 'col_1', 'col_2', 'col_3', 'col_4', 'col_5', 'col_6', 'col_7']
DATA_TYPE = [True, True, False, True, False, False, True, True, False, True]

cols = train_data.columns[DATA_TYPE].intersection(FEATURES)
print (cols)
Index(['col_0', 'col_1', 'col_3', 'col_6', 'col_7'], dtype='object')

df = train_data[cols]
print (df)
      col_0     col_1     col_3     col_6     col_7
0  0.248756  0.163067  0.808523  0.885702  0.759117
1  0.435679  0.385273  0.146091  0.569999  0.645701
2  0.180917  0.118158  0.008183  0.542723  0.857103
3  0.213594  0.973156  0.533785  0.288276  0.627167
4  0.556988  0.942390  0.896226  0.042630  0.653391