Question

使用sklearn FastICA时遇到问题。我正在尝试预测，如果预测的“源”之一以给定的方式更改，那么“测量的”变量（代码中的X）将是什么。我正在修改此example。

我认为问题在于FastICA近似于“混合”矩阵，但是ica.mixing_与我用来生成数据的非常不同。我知道混合矩阵是不确定的，因为乘积np.dot（S，AT）是相关的，将S更改为S * a，将A更改为A / a，对于所有！= 0都将得到相同的结果。

有什么想法吗？感谢您的阅读和帮助

这是我的代码。

    # this is exactly how the example start
    np.random.seed(0)
    n_samples = 200
    time = np.linspace(0, 8, n_samples)

    s1 = np.sin(2 * time)  # Signal 1 : sinusoidal signal
    s2 = np.sign(np.sin(3 * time))  # Signal 2 : square signal
    s3 = signal.sawtooth(2 * np.pi * time)  # Signal 3: saw tooth signal

    S = np.c_[s1, s2, s3]
    S += 0.2 * np.random.normal(size=S.shape)  # Add noise

    S /= S.std(axis=0)  # Standardize data
    # Here I'm changing the example. I'm modifying the 'mixing' array 
    # such that s1 is not mixed with neither s2 nor s3
    A = np.array([[1, 0, 0], [0, 2, 1.0], [0, 1.0, 2.0]])  # Mixing matrix
    # Mix data, 
    X = np.dot(S, A.T)  # Generate observations

    # Compute ICA
    ica = FastICA()
    S_ = ica.fit_transform(X)  # Reconstruct signals
    A_ = ica.mixing_  # Get estimated mixing matrix

    # We can `prove` that the ICA model applies by reverting the unmixing.
    assert np.allclose(X, np.dot(S_, A_.T) + ica.mean_)

    # Here is where my real code starts,
    # Now modify source s1
    s1 *= 1.1
    S = np.c_[s1, s2, s3]
    S /= S.std(axis=0)  # Standardize data

    # regenerate observations. 
    # Note that original code in the example uses np.dot(S, A.T) 
    # (that doesn't work either). I'm using ica.inverse_transform 
    # because it is what is documented but also because there is an
    # FastICA.mean_ that is not documented and I'm hoping 
    # inverse_transform uses it in the right way.
    # modified_X =  np.dot(S, A.T)   # does not work either
    modified_X = ica.inverse_transform(S)

    # check that last 2 observations are not changed
    # The original 'mixing' array was defined to mix s2 and s3 but not s1
    # Next tests fail
    np_testing.assert_array_almost_equal(X[:, 1], modified_X[:, 1])
    np_testing.assert_array_almost_equal(X[:, 2], modified_X[:, 2])

Answer 1

我会发布我的发现，以防它对任何人都有帮助。我认为我发布的代码有2个问题

在安装ICA时，找不到精确的“混合”矩阵，解决方案会将泄漏源1泄漏到所有测得的输出中。结果应该很小，包含大量数据，但仍然应该存在。但是，在增加伪造数据量或更改FastICA的max_iter或tol参数时，我看不到行为上的变化。
源的顺序是不可预测的，在代码中我假设发现的S_与S的顺序相同（这是错误的）。遍历所有源（在fit_transform之后），一次更改一个，我看到的结果接近我的预期。其中两个来源（对我来说是1和2）对测量变量2和3的影响最大，而第三个来源对测量变量1的影响最大，对变量2和3的影响较小。

FastICA的问题，修改独立的源会更改所有输出

1 个答案: