Question

关于自动语音识别的许多研究将语音转换为文本。这些工具正在使用深度学习来做到这一点。

我发现它的工作方式是基于英语的。如果是“ Phonics”一词的音频，则可以是Foniks，但是最接近的英语单词是Phonics。

Google API可以为我们提供最终结果的ASR。是否有任何工具或开源软件可以为我们提供语音发音？像这样的“ ˈfəʊnɪks”而不是“ Phonics”

谢谢。

Answer 1

There are several open source tools for ASR. Kaldi, CMU Sphinx and HTK are the most popular and well documented. Kaldi will be probably the best if you want to use DNNs for ASR.

However, the form of recognition result provided depends on your vocabulary. If you wish to have a word ˈfəʊnɪks instead of Phonics, you have to define it in the vocabulary. For instance:

!SIL sil
<UNK> spn
eight ey t
five f ay v
...
f_ey_ow_n_i_k_s f ey ow n i k s
....

Using Unicode symbols for word representation is not possible (as far as I remember), so I replaced them with X-SAMPA表示法。

按照this tutorial进行详细说明。

语音ASR的机器学习

1 个答案: