FastICA的问题,修改独立的源会更改所有输出

时间:2018-08-30 19:00:54

标签: python scikit-learn

使用sklearn FastICA时遇到问题。我正在尝试预测,如果预测的“源”之一以给定的方式更改,那么“测量的”变量(代码中的X)将是什么。我正在修改此example

我认为问题在于FastICA近似于“混合”矩阵,但是ica.mixing_与我用来生成数据的非常不同。我知道混合矩阵是不确定的,因为乘积np.dot(S,AT)是相关的,将S更改为S * a,将A更改为A / a,对于所有!= 0都将得到相同的结果。

有什么想法吗?感谢您的阅读和帮助

这是我的代码。

    # this is exactly how the example start
    np.random.seed(0)
    n_samples = 200
    time = np.linspace(0, 8, n_samples)

    s1 = np.sin(2 * time)  # Signal 1 : sinusoidal signal
    s2 = np.sign(np.sin(3 * time))  # Signal 2 : square signal
    s3 = signal.sawtooth(2 * np.pi * time)  # Signal 3: saw tooth signal

    S = np.c_[s1, s2, s3]
    S += 0.2 * np.random.normal(size=S.shape)  # Add noise

    S /= S.std(axis=0)  # Standardize data
    # Here I'm changing the example. I'm modifying the 'mixing' array 
    # such that s1 is not mixed with neither s2 nor s3
    A = np.array([[1, 0, 0], [0, 2, 1.0], [0, 1.0, 2.0]])  # Mixing matrix
    # Mix data, 
    X = np.dot(S, A.T)  # Generate observations

    # Compute ICA
    ica = FastICA()
    S_ = ica.fit_transform(X)  # Reconstruct signals
    A_ = ica.mixing_  # Get estimated mixing matrix

    # We can `prove` that the ICA model applies by reverting the unmixing.
    assert np.allclose(X, np.dot(S_, A_.T) + ica.mean_)

    # Here is where my real code starts,
    # Now modify source s1
    s1 *= 1.1
    S = np.c_[s1, s2, s3]
    S /= S.std(axis=0)  # Standardize data

    # regenerate observations. 
    # Note that original code in the example uses np.dot(S, A.T) 
    # (that doesn't work either). I'm using ica.inverse_transform 
    # because it is what is documented but also because there is an
    # FastICA.mean_ that is not documented and I'm hoping 
    # inverse_transform uses it in the right way.
    # modified_X =  np.dot(S, A.T)   # does not work either
    modified_X = ica.inverse_transform(S)

    # check that last 2 observations are not changed
    # The original 'mixing' array was defined to mix s2 and s3 but not s1
    # Next tests fail
    np_testing.assert_array_almost_equal(X[:, 1], modified_X[:, 1])
    np_testing.assert_array_almost_equal(X[:, 2], modified_X[:, 2])

1 个答案:

答案 0 :(得分:0)

我会发布我的发现,以防它对任何人都有帮助。 我认为我发布的代码有2个问题

  1. 在安装ICA时,找不到精确的“混合”矩阵,解决方案会将泄漏源1泄漏到所有测得的输出中。结果应该很小,包含大量数据,但仍然应该存在。但是,在增加伪造数据量或更改FastICA的max_iter或tol参数时,我看不到行为上的变化。

  2. 源的顺序是不可预测的,在代码中我假设发现的S_与S的顺序相同(这是错误的)。遍历所有源(在fit_transform之后),一次更改一个,我看到的结果接近我的预期。其中两个来源(对我来说是1和2)对测量变量2和3的影响最大,而第三个来源对测量变量1的影响最大,对变量2和3的影响较小。