不确定是否最好在此处或StackExchange上询问此问题,但由于它是一个编程问题以及可能是一个数学问题,所以这里就是。
问题是关于FastICA。
给定输入时间序列("观察"下面),其中每个时间序列是n_components信号的线性混合,ICA返回信号和混合矩阵。从http://www.cs.jhu.edu/~ayuille/courses/Stat161-261-Spring14/HyvO00-icatut.pdf第3节开始,我知道最多一个信号可能是高斯噪声。但下面我似乎证明了即使两者都是噪声,FastICA也会恢复两个信号(这里是时间序列长度的函数,从一个时间步长到10000个时间步长,16个时间序列):
# Snippet below adapted from http://scikit-learn.org/stable/auto_examples/decomposition/plot_ica_blind_source_separation.html
import numpy as np
import matplotlib.pyplot as plt
from scipy import signal
from sklearn.decomposition import FastICA, PCA
for i in [1, 2, 3, 4, 5, 10, 20, 100, 1000, 10000]: # number of timepoints
# Generate sample data
np.random.seed(0)
n_samples = i
time = np.linspace(0, 8, n_samples)
#
s1 = np.array([np.random.normal() for q in range(i)])
s2 = np.array([np.random.normal() for q in range(i)])
#
S = np.c_[s1, s2]
S += 0.2 * np.random.normal(size=S.shape) # Add extra noise, just to muddy the signals
#
S /= S.std(axis=0) # Standardize data
# Mix data
A = np.array([[np.random.normal(), np.random.normal()] for j in range(16)]) # Mixing matrix
X = np.dot(S, A.T) # Generate observations
#
# Compute ICA
ica = FastICA(n_components=2)
print i, "\t",
try:
S_ = ica.fit_transform(X) # Reconstruct signals
except ValueError:
print "ValueError: ICA does not run"
continue
A_ = ica.mixing_ # Get estimated mixing matrix
#
# We can `prove` that the ICA model applies by reverting the unmixing.
print np.allclose(X, np.dot(S_, A_.T) + ica.mean_) # X - AS ~ 0
输出:
1 ValueError: ICA does not run
2 False
3 True
4 True
5 True
10 True
20 True
100 True
1000 True
10000 True
为什么这样做?即,为什么X-AS~0(上面的allclose()条件)?请注意,如果我们生成的数据集数量远大于此处使用的数据集(例如,1,000个时间序列仍然有效),它仍然有效。