如何查看sklearn.preprocessing.PolynomialFeatures的效果?

时间:2016-01-19 06:25:00

标签: python scikit-learn preprocessor sympy

如果我有一定数量的基本特征并且从中生成一个中等顺序的多项式特征,那么知道哪个特征数组preprocess_XX的哪一列对应于哪个基本变换会让人感到困惑特征

我曾经做过类似以下的事情,使用较旧版本的sklearn(可能是0.14?):

import numpy as np
from sympy import Symbol
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(4)
x1 = Symbol('x1')
x2 = Symbol('x2')
x3 = Symbol('x3')
XX = np.random.rand(1000, 3)  # replace with the actual data array
preprocess_symXX = poly.fit_transform([x1, x2, x3])
preprocess_XX = poly.fit_transform(XX)
print preprocess_symXX

这太棒了。它会产生类似[1, x1, x2, x3, x1**2, ... ]的输出,这会让我知道我的preprocess_XX列实际来自哪些多项式函数。

但现在当我这样做时,它会抱怨TypeError: can't convert expression to float。由于名为sklearn.utils.validation的{​​{1}}中的函数尝试将输入转换为check_array()poly.fit_transform(),因此会引发此异常。

您是否建议如何查看基本要素的多项式与dtype=float输出中的哪一列相对应,现在fit_transform()?似乎不再适用于sympy

1 个答案:

答案 0 :(得分:4)

使用poly.powers_获取权力。然后你可以将它转换成人类可读的东西:

import numpy as np
from sklearn.preprocessing import PolynomialFeatures

X = np.random.rand(1000, 3)

poly = PolynomialFeatures(4)
Y = poly.fit_transform(X)

features = ['X1','X2','X3']

print(poly.powers_)

for entry in poly.powers_:
    newFeature = []
    for feat, coef in zip(features, entry):
        if coef > 0:
            newFeature.append(feat+'**'+str(coef))
    if not newFeature:
        print(1) # If all powers are 0
    else:
        print(' + '.join(newFeature))

打印(打印poly.powers _后):

1
X1**1
X2**1
X3**1
X1**2
X1**1 + X2**1
X1**1 + X3**1
X2**2
X2**1 + X3**1
X3**2
X1**3
X1**2 + X2**1
X1**2 + X3**1
X1**1 + X2**2
X1**1 + X2**1 + X3**1
X1**1 + X3**2
X2**3
X2**2 + X3**1
X2**1 + X3**2
X3**3
X1**4
X1**3 + X2**1
X1**3 + X3**1
X1**2 + X2**2
X1**2 + X2**1 + X3**1
X1**2 + X3**2
X1**1 + X2**3
X1**1 + X2**2 + X3**1
X1**1 + X2**1 + X3**2
X1**1 + X3**3
X2**4
X2**3 + X3**1
X2**2 + X3**2
X2**1 + X3**3
X3**4