确定JS AudioContext.analyserNode

时间:2017-06-12 14:56:07

标签: javascript audio html5-audio web-audio audiocontext

背景

我的目标是创建一个基于JavaScript的Web应用程序来分析和显示音频源中的频率信息,包括页内源(<audio>标签)和从客户端麦克风流式传输的信号。我很顺利:)

作为一名敏锐的萨克斯手,我的目标之一是通过检查上部分与基本音高的分布来比较不同萨克斯手和乐器音调中固有的信息。简而言之,我想得出一个表示为什么不同的乐器演奏者和乐器品牌即使在播放相同音高时听起来也不同。另外,我想比较各种“替代指法”与同一播放器/乐器的传统或标准指法的调音和频率分布。

使用JS AudioContext.analyserNode访问和显示频率信息是一个相当小的问题,我将HTML5 Canvas元素与fData元素结合使用以创建频率图或'winamp-style bargraph',类似于找到的'Visualizations with Web Audio API' @ MDN

问题

为了实现我的目标,我需要识别音频源中的一些特定信息,显着地基频的赫兹,用于乐器演奏者/乐器之间的直接比较,以及频率范围的来源,以确定我感兴趣的声音的频谱。该信息可以在下面的变量// example... var APP = function() { // ...select source and initialise etc.. var aCTX = new AudioContext(), ANAL = aCTX.createAnalyser(), rANF = requestAnimationFrame, ucID = null; ANAL.fftSize = 2048; function audioSourceStream(stream) { var source = aCTX.createMediaStreamSource(stream); source.connect(ANAL); var fData = new Uint8Array(ANAL.frequencyBinCount); (function updateCanvas() { ANAL.getByteFrequencyData(fData); // using 'fData' to paint HTML5 Canvas ucID = rANF(updateCanvas); }()); } }; 中找到...

fData

问题

虽然我可以通过<canvas> API轻松地将fData表示为条形图或线图等,但声源的基本部分和上部部分清晰可见,到目前为止,我有无法确定......

  • 频率范围fData(最小 - 最大Hz)
  • fData(Hz)
  • 中每个值的频率

如果没有这个,我就无法开始识别源的主要频率(为了比较调整与传统音乐音调名称的变化)和/或突出显示或排除所代表的区域频谱(放大或缩小等)进行更详细的检查。

我的目的是通过音高(音符名称)和频率(Hz)突出显示主频率,并在鼠标悬停时显示图形中任何单个条形的频率。 N.B。我已经有了一个数据对象,其中存储了C 0 -B 8 之间的所有频率(Hz)。

尽管多次阅读AudioContext.analyserNode specification,并且几乎每个网页和MDN上都有关于此主题的内容,我仍然不知道如何完成这部分任务。

基本上,如何将 Uint8Array() fData中的值转换为赫兹每个频率的幅度表示newlist = [] for x in range(10): temp_list = [] for y in range(10): temp_list.append(y) newlist.append(temp_list) 数组元素反映。

非常感谢任何建议,建议或鼓励。

BP

1 个答案:

答案 0 :(得分:11)

So first, understand that the output of an FFT will give you an array of relative strength in frequency RANGES, not precise frequencies.

These ranges are spread out in the spectrum [0,Nyquist frequency]. The Nyquist frequency is one-half of the sample rate. So if your AudioContext.sampleRate is 48000 (Hertz), your frequency bins will range across [0,24000] (also in Hz).

If you are using the default value of 2048 for fftSize in your AnalyserNode, then frequencyBinCount will be 1024 (it's always half the FFT size). This means each frequency bin will represent (24000/1024 = 23.4) approximately 23.4Hz of range - so the bins will look something like this (off-the-cuff, rounding errors may occur here):

fData[0] is the strength of frequencies from 0 to 23.4Hz.
fData[1] is the strength of frequencies from 23.4Hz to 46.8Hz.
fData[2] is the strength of frequencies from 46.8Hz to 70.2Hz.
fData[3] is the strength of frequencies from 70.2Hz to 93.6Hz.
...
fData[511] is the strength of frequencies from 11976.6Hz to 12000Hz.
fData[512] is the strength of frequencies from 12000Hz to 12023.4Hz.
...
fData[1023] is the strength of frequencies from 23976.6Hz to 24000Hz.

Make sense so far?

The next comment that usually comes up is "Wait a second - this is less precise, musically speaking, in the bass registers (where 23.4 Hz can cover a whole OCTAVE) than the treble registers (where there are hundreds of Hz between notes)." To that I say: Yes, yes it is. That's just how FFTs work. In the upper registers, it's easier to see tuning differences.

The NEXT next comment is usually "wow, I need a MASSIVE fftSize to be precise in the bass registers." Usually, the answer is "no, you probably shouldn't do it that way" - at some point, auto-correlation is more efficient than FFTs, and it's a lot more precise.

Hope this helps point you in the right direction, add a comment if there's a followup.