使用sklearn FastICA时遇到问题。我正在尝试预测,如果预测的“源”之一以给定的方式更改,那么“测量的”变量(代码中的X)将是什么。我正在修改此example。
我认为问题在于FastICA近似于“混合”矩阵,但是ica.mixing_与我用来生成数据的非常不同。我知道混合矩阵是不确定的,因为乘积np.dot(S,AT)是相关的,将S更改为S * a,将A更改为A / a,对于所有!= 0都将得到相同的结果。>
有什么想法吗?感谢您的阅读和帮助
这是我的代码。
# this is exactly how the example start
np.random.seed(0)
n_samples = 200
time = np.linspace(0, 8, n_samples)
s1 = np.sin(2 * time) # Signal 1 : sinusoidal signal
s2 = np.sign(np.sin(3 * time)) # Signal 2 : square signal
s3 = signal.sawtooth(2 * np.pi * time) # Signal 3: saw tooth signal
S = np.c_[s1, s2, s3]
S += 0.2 * np.random.normal(size=S.shape) # Add noise
S /= S.std(axis=0) # Standardize data
# Here I'm changing the example. I'm modifying the 'mixing' array
# such that s1 is not mixed with neither s2 nor s3
A = np.array([[1, 0, 0], [0, 2, 1.0], [0, 1.0, 2.0]]) # Mixing matrix
# Mix data,
X = np.dot(S, A.T) # Generate observations
# Compute ICA
ica = FastICA()
S_ = ica.fit_transform(X) # Reconstruct signals
A_ = ica.mixing_ # Get estimated mixing matrix
# We can `prove` that the ICA model applies by reverting the unmixing.
assert np.allclose(X, np.dot(S_, A_.T) + ica.mean_)
# Here is where my real code starts,
# Now modify source s1
s1 *= 1.1
S = np.c_[s1, s2, s3]
S /= S.std(axis=0) # Standardize data
# regenerate observations.
# Note that original code in the example uses np.dot(S, A.T)
# (that doesn't work either). I'm using ica.inverse_transform
# because it is what is documented but also because there is an
# FastICA.mean_ that is not documented and I'm hoping
# inverse_transform uses it in the right way.
# modified_X = np.dot(S, A.T) # does not work either
modified_X = ica.inverse_transform(S)
# check that last 2 observations are not changed
# The original 'mixing' array was defined to mix s2 and s3 but not s1
# Next tests fail
np_testing.assert_array_almost_equal(X[:, 1], modified_X[:, 1])
np_testing.assert_array_almost_equal(X[:, 2], modified_X[:, 2])
答案 0 :(得分:0)
我会发布我的发现,以防它对任何人都有帮助。 我认为我发布的代码有2个问题
在安装ICA时,找不到精确的“混合”矩阵,解决方案会将泄漏源1泄漏到所有测得的输出中。结果应该很小,包含大量数据,但仍然应该存在。但是,在增加伪造数据量或更改FastICA的max_iter或tol参数时,我看不到行为上的变化。
源的顺序是不可预测的,在代码中我假设发现的S_与S的顺序相同(这是错误的)。遍历所有源(在fit_transform之后),一次更改一个,我看到的结果接近我的预期。其中两个来源(对我来说是1和2)对测量变量2和3的影响最大,而第三个来源对测量变量1的影响最大,对变量2和3的影响较小。