Question

我正在尝试确定音频样本的感知音调（仅语音，没有背景或音乐），然后将语音识别为低音，次中音，中音，女中音，女高音。

为此，我使用aubio，它会返回时间代码列表以及任何给定音频文件的相应频率。

我努力寻找最佳方式来使用数据确定音高。我最初的想法要么根本不好，要么执行不力：

我获取aubio返回的频率列表，并像这样计算中位数：

exec('aubiopitch /pathtomp3file/audio.mp3',$output);

// iterate through the time/frequencies returned by aubio
// $output is a list of number pairs (one pair per line):
// The timecode followed by a whitespace followed by the frequency
// at that timecode in hertz.

foreach($output as $sample) {

    // extract frequency information
    $freq_sample=substr($sample,strpos($sample,' '));

    // add frequency to array
    $freqs[]=floor($freq_sample);

}       

// to calculate median frequency: sort array with frequencies
// and fetch the element in the middle

sort($freqs);
$median=$freqs[floor(count($freqs)/2)];

然后我将找到的中值频率映射到“ bass”，“ baritone”，“ tenor”，“ alto”等。

不幸的是，结果不一致。例如，太多次声音的中位数频率太高了。

我相信尝试确定基本频率的方法存在缺陷，但我一直想出一个更好的方法。

例如，出现以下问题：

我是否应该丢弃高于400hz的任何频率，因为它们可能来自诸如“ s”之类的声音？
当人们感知到声音的音调时，我们实际上在听什么？基本频率？某些频率的能量？

总结起来的总问题是：

“使用aubio的数据，什么是正确的编程方法来计算语音记录的感知音调（说话，而不是唱歌）？”

编辑-我如何使用AUBIO

exec('aubiopitch /pathtomp3file/audio.mp3',$output);

确定语音的基本频率

0 个答案: