Question

我正在尝试选择管道中的功能。我的管道如下：

我正在考虑使用

具有可配置策略的单变量特征选择器。

来自文档：

class sklearn.feature_selection.GenericUnivariateSelect(score_func=<function f_classif>, mode=’percentile’, param=1e-05)

   score_func : callable

    Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues). For modes ‘percentile’ or ‘kbest’ it can return a single array scores.

我有自定义分数功能，可以满足这些要求。

mode : {‘percentile’, ‘k_best’, ‘fpr’, ‘fdr’, ‘fwe’}

    Feature selection mode.

但是如何添加其他模式？可能，无需在SelectorMixin上覆盖类

修改

我的pipline看起来像：

from  sklearn.feature_selection import GenericUnivariateSelect

custom_filter=GenericUnivariateSelect(my_score)   
MyProcessingPipeline=Pipeline(steps=[('filter_step', custom_filter)])

我的处理管道非常简单：

X=pd.DataFrame(data=np.random.rand(500,3))
MyProcessingPipeline.fit(X)
MyProcessingPipeline.transform(X)

我在这方面的重要性得分是：

#Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues). 
def my_score(X,y):
    return (np.random.rand(X.shape[1]),np.zeros((X.shape[1],1)))

就我而言，我希望转换能够保留my_score返回分数>0.6的所有功能。如何获得？我越来越确定我将不得不覆盖一些原生的sklearn类，但是有没有人知道我应该覆盖哪一个来最小化要写入的代码量，同时能够执行这个非常简单的特征选择？

根据scikit-learn

修改

0 个答案: