Question

我正致力于发言人识别，发现this post on stackoverflow.com 非常有用。

虽然代码工作正常但我只是有一个小疑问：

回答中的代码

fRate = 0.010 * fs; 
....
writehtk(featureFilename, mfc', 100000, 9);

来自Voicebox的功能writehtk

function writehtk(file,d,fp,tc)
%WRITEHTK write data in HTK format []=(FILE,D,FP,TC)
%
% Inputs:
%    FILE = name of file to write (no default extension)
%       D = data to write: one row per frame
%      FP = frame period in seconds
%      TC = type code = the sum of a data type and (optionally) one or more of the listed modifiers

writehtk函数需要以秒为单位的帧周期，但在代码中单位是其他内容。

有人可以解释一下如何获得这个值吗？

Answer 1

post you linked to中存在一些混淆。

使用fRate作为melcepst的参数表明作者希望fRate表示转换为多个样本的帧之间的10ms间隔（而不是帧速率）。这也与作者使用100000作为FP参数一致，如果此参数以100ns为单位（这是执行writehtk function already does internally的转换的不正确尝试。 / p>

我个人将变量fRate重命名为fInterval，以避免混淆速率（通常以Hz为单位）和时间间隔（通常以秒或样本数量给出，当采样率为也指定）：

fInterval = 0.010 * fs; % in samples
...
mfc = melcepst(s, fs, '0dD', nCeps, nChan, fSize, fInterval, fL, fH);

然后以秒为单位的帧持续时间仅为0.01，或者根据先前定义的变量fInterval/fs给出：

writehtk(featureFilename, mfc', fInterval/fs, 9);

使用writehtk进行特征提取（说话人识别）

1 个答案: