我正在尝试解决“鸡尾酒会问题”。
这是video的精美解释和解决方法。
在视频中,他声称只需一行代码即可解决此问题。 因此,我得到了他在视频from here中使用的那些音频文件,并且包括了他在视频中使用的代码行(第5行),但结果却差得多。我的代码基本上只是以较小的音量输出相同的原始混合音频文件。
这是我在Octave中的代码:
[x1, Fs1] = audioread('mixed1.wav');
[x2, Fs2] = audioread('mixed2.wav');
xx = [x1, x2]';
yy = sqrtm(inv(cov(xx')))*(xx-repmat(mean(xx,2),1,size(xx,2)));
[W,s,v] = svd((repmat(sum(yy.*yy,1),size(yy,1),1).*yy)*yy');
a = W*xx;
audiowrite('refined1.wav', a(1,:), Fs1);
audiowrite('refined2.wav', a(2,:), Fs1);
我不明白为什么这行不通。我的意思是,他实际上在视频中显示出它可以正常工作,也许不是100%准确,但是绝对可以很好地工作。
有人知道我做错了什么以及如何解决吗?
答案 0 :(得分:1)
以下是Octave代码,演示如何:
可以轻松地对其进行修改,以处理已经混合的文件。我用吴安德(Andrew Ng)的视频中显示的files进行了测试。
# Read original (unmixed) signals.
[o1, Fs1] = audioread('original1.wav');
[o2, Fs2] = audioread('original2.wav');
# Sampling rates Fs1, Fs2 should be equal!
# o Nx2 contains original signals
o = [o1, o2];
# A is a mixing matrix to make a linear combination of the input sounds.
# It can be arbitrarily changed (must be invertible).
A = [.8,.5 ; .1,.4];
# m Nx2 contains mixed signals
m = o * A;
# Save mixed files
audiowrite('mixed1.wav', m(:, 1), Fs1);
audiowrite('mixed2.wav', m(:, 2), Fs1);
# Uncomment to read your own mixed files.
#[m1, Fs1] = audioread('mymix1.wav');
#[m2, Fs2] = audioread('mymix2.wav');
#m = [m1, m2];
if 0
# Precise solution
# W1 is ideal unmixing matrix
W1 = inv(A);
# s Nx2 contains separated signals
s = m * W1;
else
# Compute W by a magic algo
# See https://cs.nyu.edu/~roweis/kica.html
xx = m';
yy = sqrtm(inv(cov(xx')))*(xx-repmat(mean(xx,2),1,size(xx,2)));
[W,s,v] = svd((repmat(sum(yy.*yy,1),size(yy,1),1).*yy)*yy');
ss = W * yy;
# Scale down by an empiric value
s = ss * 0.5;
# s Nx2 contains separated signals
s = s';
end
audiowrite('separated1.wav', s(:, 1), Fs1);
audiowrite('separated2.wav', s(:, 2), Fs1);