按度生成多项式特征

时间:2019-03-21 01:06:07

标签: python

我正在尝试在不使用sklearn的情况下生成多项式特征。给定一个numpy数组和度,我需要按顺序生成所有多项式特征。示例:

  

具有2阶多项式特征的输入[a,b]为[a,b,a ^ 2,ab,b ^ 2]

以下是我提出的部分解决方案。我遇到的问题是将a和b乘以任意程度,但顺序不正确。

def polynomialFeatures(X, degree = 1): 
features = []
while (degree > 0):
    for i in X:
        features.append(i ** degree)
    degree = degree - 1
features.append(X[0] * X[1])
return features

我也尝试过使用itertools.combinations_with_replacement,但这并不能解决a和b相乘的问题。有什么建议吗?

1 个答案:

答案 0 :(得分:0)

这是我想出的:

import numpy as np

def polynomialFeatures( X, degree = 2, interaction_only = False, include_bias = True ) :
    features = X.copy()
    prev_chunk = X
    indices = list( range( len( X ) ) )

    for d in range( 1, degree ) :
        # Create a new chunk of features for the degree d:
        new_chunk = []
        # Multiply each component with the products from the previous lower degree:
        for i, v in enumerate( X[:-d] if interaction_only else X ) :
            # Store the index where to start multiplying with the current component
            # at the next degree up:
            next_index = len( new_chunk )
            for coef in prev_chunk[indices[i+( 1 if interaction_only else 0 )]:] :
                new_chunk.append( v*coef )
            indices[i] = next_index
        # Extend the feature vector with the new chunk of features from the degree d:
        features = np.append( features, new_chunk )
        prev_chunk = new_chunk

    if include_bias :
        features = np.insert( features, 0, 1 )

    return features

它与X作为列表或一维数组一起工作(然后一次处理一个样本)。如果有需要,我可以很高兴地将函数用于二维数组的处理(一次处理多个样本)!

我已经在所有可能的情况下进行了测试,它与sklearn.preprocessing.PolynomialFeatures的输出完全匹配

要查看相应的输出产品,可以在函数中用new_chunk.append( v*coef )更改行new_chunk.append( v + coef )并输入字符列表,例如:

polynomialFeatures( [ 'a', 'b', 'c' ], 3, True, True )

将输出例如:

['1' 'a' 'b' 'c' 'ab' 'ac' 'bc' 'abc']

奖金(“我要加油吗,我会遇见吗?”):

对于那些最终需要它的人,我已经在C ++ 11中翻译了以前的代码:

template <class T>
std::vector<T> polynomialFeatures( const std::vector<T>& input, unsigned int degree, bool interaction_only, bool include_bias )
{
    std::vector<T> features = input;
    std::vector<T> prev_chunk = input;
    std::vector<size_t> indices( input.size() );
    std::iota( indices.begin(), indices.end(), 0 );

    for ( int d = 1 ; d < degree ; ++d )
    {
        // Create a new chunk of features for the degree d:
        std::vector<T> new_chunk;
        // Multiply each component with the products from the previous lower degree:
        for ( size_t i = 0 ; i < input.size() - ( interaction_only ? d : 0 ) ; ++i )
        {
            // Store the index where to start multiplying with the current component at the next degree up:
            size_t next_index = new_chunk.size();
            for ( auto coef_it = prev_chunk.begin() + indices[i + ( interaction_only ? 1 : 0 )] ; coef_it != prev_chunk.end() ; ++coef_it )
            {
                new_chunk.push_back( input[i]**coef_it );
            }
            indices[i] = next_index;
        }
        // Extend the feature vector with the new chunk of features:
        features.reserve( features.size() + std::distance( new_chunk.begin(), new_chunk.end() ) );
        features.insert( features.end(), new_chunk.begin(), new_chunk.end() );

        prev_chunk = new_chunk;
    }
    if ( include_bias )
        features.insert( features.begin(), 1 );

    return features;
}

它与sklearn.preprocessing.PolynomialFeatures输出完全兼容,因此您可以使用Scikit-learn训练权重,然后将其导入C ++程序进行预测。