我正在研究回归问题。对于我的模型,我正在使用“随机森林分类器”进行降维。输出是用空格分隔的布尔值字符串,突出显示了“ True”的良好功能。看起来像这样:
[ True True True True True True True True True True True True
True True True False True True False True True True False True
True True True True True True True False True False False True
True False False False False False False False False False False True
False False True False False False False False False True False False
False True False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False True False False True False False
False True False True False False False False False False False False
False False False False False False False False False False False False
False True False False False False False False False False True False
False False False False False True False False False True True False
False False False False False False False False False False False False
False False False False False False True False False False False False
False False True False False True False True False True False False
True False False False False False False False False False False False
False False False True False True False True False False False False
False False False False False True True False False False False False
False False False False True False True True False True False False
False False False True True True False False False False False False
False False False False False False False False False False False False
False False False False False False True False False False False False
False False False False False False False False True False False False
False True False]
所以我要做的就是将其转换成这样的逗号分隔列表:
[True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, False, True, True, False, True, True, True, False, True, True, True, True, True, True, True, True, False, True, False, False, True, True, False, False, False, False, False, False, False, False, False, False, True, False, False, True, False, False, False, False, False, False, True, False, False, False, True, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, True, False, False, False, True, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, True, False, False, False, True, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, True, False, False, True, False, True, False, True, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, True, False, True, False, False, False, False, False, False, False, False, False, True, True, False, False, False, False, False, False, False, False, False, True, False, True, True, False, True, False, False, False, False, False, True, True, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False]
,然后遍历每个元素并检索相应的测试列。这是此过程的全部代码:
sel = SelectFromModel(RandomForestClassifier(n_estimators = 100), threshold = '1.25*mean')
sel.fit(x_train, y_train)
selected = sel.get_support()
selected_list = list(selected)
columns_list = []
for i in range(len(selected_list)):
if(selected_list[i] == 'True'):
columns_list.append(test[i])
print(columns_list)
但是,尽管我尝试将其附加到我的columns_list
上,但仍然得到一个空白列表。基本上,我的目标是在预测中使用降维中的列。我正在针对此问题使用线性回归。
更新
将代码更改为以下建议时,出现以下错误:
Traceback (most recent call last):
File "/opt/anaconda/envs/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2890, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/onur/Documents/Boston-Kaggle/Model.py", line 100, in <module>
columns_list.append(test[i])
File "/opt/anaconda/envs/lib/python3.7/site-packages/pandas/core/frame.py", line 2975, in __getitem__
indexer = self.columns.get_loc(key)
File "/opt/anaconda/envs/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2892, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 0
答案 0 :(得分:1)
您的问题在这里:
<input type="range" min="1" max="200" name="ans" class="slider" id="myRange">
<p>Weight: <span id="outputKg"></span></p>
<p>Pounds: <span id="outputLbs"></span></p>
您正在将布尔值与字符串值if(selected_list[i] == 'True'):
columns_list.append(test[i])
而不是'True'
进行比较
一个紧凑而Python化的解决方案是:
True
第二个错误是因为您正在使用 if selected_list[i]:
columns_list.append(test[i])
访问数据帧test
。您需要使用方法[]
对于用法,取决于所包含的测试:
.iloc
编辑,更明确的解决方案:
test.iloc[0] # first row of data frame- Note a Series data type output.
test.iloc[1] # second row of data frame
test.iloc[-1] # last row of data frame
# Columns:
test.iloc[:,0] # first column of data frame
test.iloc[:,1] # second column of data frame
test.iloc[:,-1] # last column of data frame
答案 1 :(得分:1)
我会做这样的事情:
columns_list = list(x_train.columns[selected_list])
selected_test = test[columns_list]
通过这种方式,您可以从x_train
中检索所选列的名称,将其放在column_list
中,然后在test
中进行搜索。即使训练和测试数据没有相同数量的列,这也应该起作用。如果测试数据不具有所选功能之一,显然就行不通。