我正在使用scikit-learn缩小具有约800个功能的数据集。据我所知,套索为相同的数据集返回相同的特征。这是一个非常嘈杂的数据(市场和经济数据)。但是,在跑步过程中我没有观察到这一点。这是我的功能:
def select_lasso_feat(self, train_data, features, target):
if len(features) <= 60:
print('LASSO feature selection step skipped. Too few features on your dataset!')
return features
print('Performing LASSO feature selection...')
X_train = self._standardize(train_data[features])
y = train_data[target]
alpha = 0.0003
feat_len = 0
while feat_len < 60:
estimator = Lasso(alpha=alpha, random_state=23)
feature_selection = SelectFromModel(estimator, threshold=0.1)
feature_selection.fit(X_train, y)
selected_features = feature_selection.transform(X_train)
selected_features = list(pd.DataFrame(X_train).columns[feature_selection.get_support()])
feat_len = len(selected_features)
alpha -= 0.00003
return list(set(selected_features))
如图所示,我一直拟合套索,直到达到所需的特征数量(本例中为60)。我用jupyter做试验。每当我关闭服务器并使用完全相同的数据重新运行代码时,我都会得到套索返回的不同功能列表。那可能是什么原因呢?