Question

考虑我们有 $N$ 个数据点 $\mathbf{x} = [x_1, x_2]^T$ 的数据矩阵，我们有兴趣将这些数据点映射到更高维度的特征空间。我们可以通过使用d次多项式来做到这一点。因此，对于 $N$ 个数据点序列，新数据矩阵是

$\begin{bmatrix} x_1 & x_2 & x_1^2 & x_1x_2 & x_2^2 & x_1^3 & x_1^2 x_2 & x_1 x_2^2 & x_2^3 & \dots\\[0.3em] x_1 & x_2 & x_1^2 & x_1x_2 & x_2^2 & x_1^3 & x_1^2 x_2 & x_1 x_2^2 & x_2^3 & \dots\\[0.3em] \vdots & \vdots & & & & & & & & \vdots \\[0.3em] x_1 & x_2 & x_1^2 & x_1x_2 & x_2^2 & x_1^3 & x_1^2 x_2 & x_1 x_2^2 & x_2^3 & \dots\\[0.3em] \end{bmatrix}\in \mathbf{R}^{N \times 2^{d} + 1 }$

我研究了一个相关的脚本（ Andrew Ng。在线课程），它将二维数据点转换为更高的特征空间。但是，我无法想出一种在任意更高维度的样本 $\mathbf{x} = [x_1, x_2, \dots, x_D]$ 中进行推广的方法。这是代码：

d = 6;
m = size(D,1); 
new = ones(m);
for k = 1:d
    for l = 0:k
        new(:, end+1) = (x1.^(k-l)).*(x2.^l);
    end
end

我们可以对此代码进行矢量化吗？另外，给定一个数据矩阵 $\mathbf{D} \in \mathbf{R}^{N \times 2}$ ，您可以建议如何使用d维多项式将任意维度的数据点转换为更高的数据点吗？

PS：d维数据点的推广非常有用。

Answer 1

此解决方案可处理sudo npm个变量并生成k多项式的所有项，其中d和k为非负整数。大多数代码长度是由于在d变量中生成d多项式的所有项的组合复杂性。

k到n_obs个数据矩阵k，其中X是观察数量，n_obs是变量数量。

辅助功能

此函数生成所有可能的行，使得每个条目都是非负整数，并且行总和为正整数：

该功能（几乎可以肯定地写得更有效）是：

the row [0, 1, 3, 0, 1]  corresponds to (x1^0)*(x1^1)*(x2^3)*(x4^0)*(x5^1)

初始化代码

function result = mg_sums(n_numbers, d)
if(n_numbers<=1)
    result = d;
else
    result = zeros(0, n_numbers);    
    for(i = d:-1:0)
        rc = mg_sums(n_numbers - 1, d - i);
        result = [result; i * ones(size(rc,1), 1), rc];
    end    
end

最后一步：方法1

n_obs  = 1000;  % number observations
n_vars = 3;     % number of variables
max_degree  = 4;     % order of polynomial

X = rand(n_obs, n_vars);  % generate random, strictly positive data

stacked = zeros(0, n_vars); %this will collect all the coefficients...    
for(d = 1:max_degree)          % for degree 1 polynomial to degree 'order'
    stacked = [stacked; mg_sums(n_vars, d)];
end

使用方法1或方法2。

最后一步：方法2（要求数据矩阵X中的所有数据严格为正（问题是如果你有0个元素，`newX = zeros(size(X,1), size(stacked,1)); for(i = 1:size(stacked,1)) accumulator = ones(n_obs, 1); for(j = 1:n_vars) accumulator = accumulator .* X(:,j).^stacked(i,j); end newX(:,i) = accumulator; end`没有正确传播当你调用矩阵代数例程时。）

-inf

运行示例

newX = real(exp(log(X) * stacked'));  % multiplying log of data matrix by the    
                                % matrix of all possible exponent combinations
                                % effectively raises terms to powers and multiplies them!

堆积矩阵和它代表的多项式项是：

X = [2, 3, 5];
max_degree = 3;

如果数据矩阵1 0 0 x1 2 0 1 0 x2 3 0 0 1 x3 5 2 0 0 x1.^2 4 1 1 0 x1.*x2 6 1 0 1 x1.*x3 10 0 2 0 x2.^2 9 0 1 1 x2.*x3 15 0 0 2 x3.^2 25 3 0 0 x1.^3 8 2 1 0 x1.^2.*x2 12 2 0 1 x1.^2.*x3 20 1 2 0 x1.*x2.^2 18 1 1 1 x1.*x2.*x3 30 1 0 2 x1.*x3.^2 50 0 3 0 x2.^3 27 0 2 1 x2.^2.*x3 45 0 1 2 x2.*x3.^2 75 0 0 3 x3.^3 125为X，则会正确生成：

[2, 3, 5]

第一列为newX = [2, 3, 5, 4, 6, 10, 9, 15, 25, 8, 12, 20, 18, 30, 50, 27, 45, 75, 125];，第二列为x1，第三列为x2，第四列为x3，第五列为x1.^2等...... < / p>

使用多变量多项式的特征映射

1 个答案:

辅助功能

初始化代码

最后一步：方法1

运行示例