如何将选定的要素从“随机森林”转换为新列表

时间:2019-09-13 23:59:17

标签: python pandas machine-learning scikit-learn random-forest

我正在研究回归问题。对于我的模型,我正在使用“随机森林分类器”进行降维。输出是用空格分隔的布尔值字符串,突出显示了“ True”的良好功能。看起来像这样:

[ True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True False  True  True False  True  True  True False  True
  True  True  True  True  True  True  True False  True False False  True
  True False False False False False False False False False False  True
 False False  True False False False False False False  True False False
 False  True False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False  True False False  True False False
 False  True False  True False False False False False False False False
 False False False False False False False False False False False False
 False  True False False False False False False False False  True False
 False False False False False  True False False False  True  True False
 False False False False False False False False False False False False
 False False False False False False  True False False False False False
 False False  True False False  True False  True False  True False False
  True False False False False False False False False False False False
 False False False  True False  True False  True False False False False
 False False False False False  True  True False False False False False
 False False False False  True False  True  True False  True False False
 False False False  True  True  True False False False False False False
 False False False False False False False False False False False False
 False False False False False False  True False False False False False
 False False False False False False False False  True False False False
 False  True False]

所以我要做的就是将其转换成这样的逗号分隔列表:

[True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, False, True, True, False, True, True, True, False, True, True, True, True, True, True, True, True, False, True, False, False, True, True, False, False, False, False, False, False, False, False, False, False, True, False, False, True, False, False, False, False, False, False, True, False, False, False, True, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, True, False, False, False, True, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, True, False, False, False, True, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, True, False, False, True, False, True, False, True, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, True, False, True, False, False, False, False, False, False, False, False, False, True, True, False, False, False, False, False, False, False, False, False, True, False, True, True, False, True, False, False, False, False, False, True, True, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False]

,然后遍历每个元素并检索相应的测试列。这是此过程的全部代码:

sel = SelectFromModel(RandomForestClassifier(n_estimators = 100), threshold = '1.25*mean')
sel.fit(x_train, y_train)

selected = sel.get_support()
selected_list = list(selected)
columns_list = []

for i in range(len(selected_list)):
    if(selected_list[i] == 'True'):
        columns_list.append(test[i])

print(columns_list)

但是,尽管我尝试将其附加到我的columns_list上,但仍然得到一个空白列表。基本上,我的目标是在预测中使用降维中的列。我正在针对此问题使用线性回归。

更新

将代码更改为以下建议时,出现以下错误:

Traceback (most recent call last):
  File "/opt/anaconda/envs/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2890, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/onur/Documents/Boston-Kaggle/Model.py", line 100, in <module>
    columns_list.append(test[i])
  File "/opt/anaconda/envs/lib/python3.7/site-packages/pandas/core/frame.py", line 2975, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/opt/anaconda/envs/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2892, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 0

2 个答案:

答案 0 :(得分:1)

您的问题在这里:

<input type="range" min="1" max="200" name="ans" class="slider" id="myRange">
<p>Weight: <span id="outputKg"></span></p>
<p>Pounds: <span id="outputLbs"></span></p>

您正在将布尔值与字符串值if(selected_list[i] == 'True'): columns_list.append(test[i]) 而不是'True'进行比较

一个紧凑而Python化的解决方案是:

True

第二个错误是因为您正在使用 if selected_list[i]: columns_list.append(test[i]) 访问数据帧test。您需要使用方法[]

对于用法,取决于所包含的测试:

.iloc

编辑,更明确的解决方案:

test.iloc[0] # first row of data frame- Note a Series data type output.
test.iloc[1] # second row of data frame 
test.iloc[-1] # last row of data frame 
# Columns:
test.iloc[:,0] # first column of data frame 
test.iloc[:,1] # second column of data frame 
test.iloc[:,-1] # last column of data frame

答案 1 :(得分:1)

我会做这样的事情:

columns_list = list(x_train.columns[selected_list])
selected_test = test[columns_list]

通过这种方式,您可以从x_train中检索所选列的名称,将其放在column_list中,然后在test中进行搜索。即使训练和测试数据没有相同数量的列,这也应该起作用。如果测试数据不具有所选功能之一,显然就行不通。