我使用谐波产品频谱来查找存在多个谐波时的基本音符。这是我实施的代码;
[song,FS] = wavread('C major.wav');
%sound(song,FS);
P = 20000;
N=length(song); % length of song
t=0:1/FS:(N-1)/FS; % define time period
song = sum(song,2);
song=abs(song);
%----------------------Finding the envelope of the signal-----------------%
% Gaussian Filter
w = linspace( -1, 1, P); % create a vector of P values between -1 and 1 inclusive
sigma = 0.335; % standard deviation used in Gaussian formula
myFilter = -w .* exp( -(w.^2)/(2*sigma.^2)); % compute first derivative, but leave constants out
myFilter = myFilter / sum( abs( myFilter ) ); % normalize
% fft convolution
myFilter = myFilter(:); % create a column vector
song(length(song)+length(myFilter)-1) = 0; %zero pad song
myFilter(length(song)) = 0; %zero pad myFilter
edges =ifft(fft(song).*fft(myFilter));
tedges=edges(P:N+P-1); % shift by P/2 so peaks line up w/ edges
tedges=tedges/max(abs(tedges)); % normalize
%---------------------------Onset Detection-------------------------------%
% Finding peaks
maxtab = [];
mintab = [];
x = (1:length(tedges));
min1 = Inf;
max1 = -Inf;
min_pos = NaN;
max_pos = NaN;
lookformax = 1;
for i=1:length(tedges)
peak = tedges(i:i);
if peak > max1,
max1 = peak;
max_pos = x(i);
end
if peak < min1,
min1 = peak;
min_pos = x(i);
end
if lookformax
if peak < max1-0.07
maxtab = [maxtab ; max_pos max1];
min1 = peak;
min_pos = x(i);
lookformax = 0;
end
else
if peak > min1+0.08
mintab = [mintab ; min_pos min1];
max1 = peak;
max_pos = x(i);
lookformax = 1;
end
end
end
max_col = maxtab(:,1);
peaks_det = max_col/FS;
No_of_peaks = length(peaks_det);
[song,FS] = wavread('C major.wav');
song = sum(song,2);
%---------------------------Performing STFT--------------------------------%
h = 1;
%for i = 2:No_of_peaks
song_seg = song(max_col(7-1):max_col(7)-1);
L = length(song_seg);
NFFT = 2^nextpow2(L); % Next power of 2 from length of y
seg_fft = fft(song_seg,NFFT);%/L;
f = FS/2*linspace(0,1,NFFT/2+1);
seg_fft_2 = 2*abs(seg_fft(1:NFFT/2+1));
L5 = length(song_seg);
figure(6)
plot(f,seg_fft_2)
%plot(1:L/2,seg_fft(1:L/2))
title('Frequency spectrum of signal (seg_fft)')
xlabel('Frequency (Hz)')
xlim([0 2500])
ylabel('|Y(f)|')
ylim([0 500])
%----------------Performing Harmonic Product Spectrum---------------------%
% In harmonic prodcut spectrum, you downsample the fft data several times and multiply all those with the original fft data to get the maximum peak.
%HPS
seg_fft = seg_fft(1 : size(seg_fft,1)/2 );
seg_fft = abs(seg_fft);
a = length(seg_fft);
seg_fft2 = ones(size(seg_fft));
seg_fft3 = ones(size(seg_fft));
seg_fft4 = ones(size(seg_fft));
seg_fft5 = ones(size(seg_fft));
for i = 1:((length(seg_fft)-1)/2)
seg_fft2(i,1) = seg_fft(2*i,1);%(seg_fft(2*i,1) + seg_fft((2*i)+1,1))/2;
end
%b= size(seg_fft2)
L1 = length(seg_fft2);
NFFT1 = 2^nextpow2(L1); % Next power of 2 from length of y
f1 = FS/2*linspace(0,1,NFFT1/2+1);
seg_fft12 = 2*abs(seg_fft2(1:NFFT1/2+1));
figure(7);
plot(f1,seg_fft12)
title('Frequency spectrum of signal (seg_fft2)')
xlabel('Frequency (Hz)')
xlim([0 2500])
ylabel('|Y(f)|')
ylim([0 500])
这是图6
的图所以在实际情况下,一旦我执行HPS(下采样为2),440.1处的峰值应该下移到220,而881处的峰值应该下降到440左右。但是当我绘制的图表不是我得到的。插入这是我得到的图表,
为什么我没有得到正确的图表????我似乎不明白我在这里做错了什么...有人可以看看,让我知道..谢谢.....
答案 0 :(得分:1)
下采样的问题在于,在进行下采样之前将矢量调整2倍,而不是之后。你做了
seg_fft = seg_fft(1 : size(seg_fft,1)/2 );
% [... other stuff ...]
for i = 1:((length(seg_fft)-1)/2)
seg_fft2(i,1) = seg_fft(2*i,1);%(seg_fft(2*i,1) + seg_fft((2*i)+1,1))/2;
end
相反,您需要首先进行下采样,然后修剪:
for i = 1:((length(seg_fft)-1)/2)
seg_fft2(i,1) = seg_fft(2*i,1);%(seg_fft(2*i,1) + seg_fft((2*i)+1,1))/2;
end
seg_fft = seg_fft(1 : size(seg_fft,1)/2 );
更新您问为什么这不会保留峰值。简短的回答是你可能没有“看”峰值。如果您希望在下采样n
期间保留(最近)峰值,则可以执行以下操作:
n = 3; % degree of decimation or downsampling we want to do
N = size(seg_fft, 1); % number of samples in original FFT
Nn = n * floor(N/n); % number of samples that can be divided by n
fftBlock = reshape(seg_fft(1:Nn, 1), n, N);
fftResampled = max(fftBlock);
这是如何工作的?让我们使用10 x 1点的简单示例:
seg_fft = [0 1 10 5 4 3 6 12 4 3];
我们想要“每3个”。朴素算法会给出
fftResampled = [2 3 7];
但我们会喜欢“峰值”[10 3 12]
- 不幸的是它们不在正确的位置。
重塑数组后(丢失最后一个元素;如果它可能是一个有趣的值,我们可以追加并用零填充),我们得到:
fftBlock = [0 5 6;
1 4 3;
10 3 4];
(请记住,Matlab矩阵是行先行的)
现在取max
(除非我们另有说明,否则该功能将沿第一维运作)你得到
fftResampled = [10 5 6];
即。总是最高峰。虽然这保留了峰值,但它确实意味着你的“山谷”正在填补一点。
结论:在下采样的过程中,没有办法不破坏“某些”信息 - 毕竟,你扔掉了一半的样本。您保留的内容,以及您如何对丢弃的数据中的信息内容进行说明,只有您可以自行决定,因为这取决于您的申请,以及对您来说重要的内容。