我正在尝试在不使用sklearn的情况下生成多项式特征。给定一个numpy数组和度,我需要按顺序生成所有多项式特征。示例:
具有2阶多项式特征的输入[a,b]为[a,b,a ^ 2,ab,b ^ 2]
以下是我提出的部分解决方案。我遇到的问题是将a和b乘以任意程度,但顺序不正确。
def polynomialFeatures(X, degree = 1):
features = []
while (degree > 0):
for i in X:
features.append(i ** degree)
degree = degree - 1
features.append(X[0] * X[1])
return features
我也尝试过使用itertools.combinations_with_replacement,但这并不能解决a和b相乘的问题。有什么建议吗?
答案 0 :(得分:0)
这是我想出的:
import numpy as np
def polynomialFeatures( X, degree = 2, interaction_only = False, include_bias = True ) :
features = X.copy()
prev_chunk = X
indices = list( range( len( X ) ) )
for d in range( 1, degree ) :
# Create a new chunk of features for the degree d:
new_chunk = []
# Multiply each component with the products from the previous lower degree:
for i, v in enumerate( X[:-d] if interaction_only else X ) :
# Store the index where to start multiplying with the current component
# at the next degree up:
next_index = len( new_chunk )
for coef in prev_chunk[indices[i+( 1 if interaction_only else 0 )]:] :
new_chunk.append( v*coef )
indices[i] = next_index
# Extend the feature vector with the new chunk of features from the degree d:
features = np.append( features, new_chunk )
prev_chunk = new_chunk
if include_bias :
features = np.insert( features, 0, 1 )
return features
它与X作为列表或一维数组一起工作(然后一次处理一个样本)。如果有需要,我可以很高兴地将函数用于二维数组的处理(一次处理多个样本)!
我已经在所有可能的情况下进行了测试,它与sklearn.preprocessing.PolynomialFeatures
的输出完全匹配
要查看相应的输出产品,可以在函数中用new_chunk.append( v*coef )
更改行new_chunk.append( v + coef )
并输入字符列表,例如:
polynomialFeatures( [ 'a', 'b', 'c' ], 3, True, True )
将输出例如:
['1' 'a' 'b' 'c' 'ab' 'ac' 'bc' 'abc']
奖金(“我要加油吗,我会遇见吗?”):
对于那些最终需要它的人,我已经在C ++ 11中翻译了以前的代码:
template <class T>
std::vector<T> polynomialFeatures( const std::vector<T>& input, unsigned int degree, bool interaction_only, bool include_bias )
{
std::vector<T> features = input;
std::vector<T> prev_chunk = input;
std::vector<size_t> indices( input.size() );
std::iota( indices.begin(), indices.end(), 0 );
for ( int d = 1 ; d < degree ; ++d )
{
// Create a new chunk of features for the degree d:
std::vector<T> new_chunk;
// Multiply each component with the products from the previous lower degree:
for ( size_t i = 0 ; i < input.size() - ( interaction_only ? d : 0 ) ; ++i )
{
// Store the index where to start multiplying with the current component at the next degree up:
size_t next_index = new_chunk.size();
for ( auto coef_it = prev_chunk.begin() + indices[i + ( interaction_only ? 1 : 0 )] ; coef_it != prev_chunk.end() ; ++coef_it )
{
new_chunk.push_back( input[i]**coef_it );
}
indices[i] = next_index;
}
// Extend the feature vector with the new chunk of features:
features.reserve( features.size() + std::distance( new_chunk.begin(), new_chunk.end() ) );
features.insert( features.end(), new_chunk.begin(), new_chunk.end() );
prev_chunk = new_chunk;
}
if ( include_bias )
features.insert( features.begin(), 1 );
return features;
}
它与sklearn.preprocessing.PolynomialFeatures
输出完全兼容,因此您可以使用Scikit-learn训练权重,然后将其导入C ++程序进行预测。