audio - 在没有语音识别的情况下检测多个语音

如果有多人说话，有没有办法实时检测？我需要语音识别API吗？

我不想将音频分开，我也不想将其转录。我的方法是经常使用一个麦克风（ - >单声道）进行录音，然后分析这些录音。但是，我怎么能发现并区分声音呢？我只是通过查看相关频率来缩小范围，但随后......

我明白这不是一项微不足道的事。这就是为什么我希望那里的api能够开箱即用 - 最好是一个移动/网络友好的api。

现在这可能听起来像圣诞节的购物清单，但如上所述，我不需要了解有关内容的任何信息。所以我的猜测是，完全成熟的语音识别会对性能造成很大影响。

Most of similar problems (adult/children classifier, speech/music classifier, single voice / voice mixture classifier) are standard machine learning problems. You can solve them with classifier like GMM. You only need to construct training data for your task, so:

Take some amount of clean recordings, you can download audiobook
Prepare mixed data by mixing clean recordings
Train GMM classifier on both
Compare probabilities from clean speech GMM and mixed speech GMM and decide the presence of mixture by ratio of probabilities from two classifiers.

You can find some code samples here:

https://github.com/littleowen/Conceptor

For example you can try

https://github.com/littleowen/Conceptor/blob/master/Gender.ipynb

在没有语音识别的情况下检测多个语音

1 个答案: