设置

Question

我正在处理mp3文件以使用speech_recognization python以文本形式获取语音。在这里，我需要每隔10秒从mp3文件中获取文本。我无法准确的结果。所以我的想法是每隔10秒获得音频的频率，如果频率太低，那么我想将音频转换为文本到那一点。（我不想使用numpy，scipy，matplotlib）。

请提出宝贵的建议。

Answer 1

为了检测低频，您需要使用STFFT [短时快速傅里叶变换]算法。更好的方法可能是检测振幅[响度]和静音。

PYDUB可以更轻松地在 DBFS /最大音量和 RMS音量检测中完成响度。

您可以使用
安装pydub pip install pydub

至于以10秒的间隔分割音频并通过python中的speech_recognition模块提供音频，我得到了一个粗略的程序。它几乎没有任何问题，也绝不是一个全面的问题，但它提供了一些你正在寻找的方向的洞察力。它可以提供一个概念证明。该程序适用于WAV文件，但您可以用MP3替换wav格式以使其与MP3一起使用。

设置

基本上，我从这个网站下载了免费/开源预先录制的wav文件，并使用PYDUB连接它们。

[https://evolution.voxeo.com/library/audio/prompts/numbers/index.jsp]

当我测试单个文件时，只有谷歌翻译工作，所以我摆脱了其他人以使代码清理。

从这里下载用于语音识别的示例python代码， https://github.com/Uberi/speech_recognition/blob/master/examples/wav_transcribe.py

因此程序使用pydub读取和切片音频文件，该文件以10秒的间隔从0到100说出单词。由于预先录制的文件的性质以及该程序不考虑动态切片的事实，正如您将在输出中看到的那样，识别不会被协调。

我相信可以开发一个更好的程序，可以动态识别静音并相应地切片音频。

这是在Windows系统上使用python 2.7

开发的

程序

############################### Declarations ##############################################

import os
from pydub import AudioSegment
import speech_recognition as sr



#Read main audio file to be processed. Assuming in the same folder as this script
sound = AudioSegment.from_wav("0-100.wav")

#slice time are in seconds
tenSecSlice = 10 * 1000 

#Total Audio Length
audioLength = len(sound)

#Get quotient and remainder 
q, r = divmod(audioLength, tenSecSlice)

#Get total segments and rounds to next greater integer 
totalSegments= q + int(bool(r)) 

exportPath = "\\tempDir\\"

####################################################
#Function for Speech Recognition  
#downloaded & modified  from above mentioned site  
####################################################  


def processAudio(WAV_FILE):
    r = sr.Recognizer()
    with sr.WavFile(WAV_FILE) as source:
        audio = r.record(source) # read the entire WAV file

    # recognize speech using Google Speech Recognition
    try:
        # for testing purposes, we're just using the default API key
        # to use another API key, use `r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")`
        # instead of `r.recognize_google(audio)`
        print("Google Speech Recognition thinks you said " + r.recognize_google(audio))
    except sr.UnknownValueError:
        print("Google Speech Recognition could not understand audio")
    except sr.RequestError as e:
        print("Could not request results from Google Speech Recognition service; {0}".format(e))

############################### Slice Audio and Process ################################

#Declare empty List

exportPath = "tempDir\\"
segmentList = []
n=0

#Iterate through slices  and feed to speech recognition function
while n < totalSegments:
    firstPart = (tenSecSlice * n)
    secondPart =  (tenSecSlice * (n + 1))

    print ("Making slice  from %d to %d  (sec)" % (firstPart /1000 , secondPart /1000))
    print ("Recognizing words from  %d to %d " % (firstPart /1000 , secondPart /1000))
    tempObject = sound[ firstPart :secondPart ]
    myAudioFile = exportPath + "slice" + str(n) +".wav"
    tempObject.export(myAudioFile , format="wav")
    n += 1
    processAudio(myAudioFile)
    print ("")

############################### End Program ##############################################

输出

    Python 2.7.9 (default, Dec 10 2014, 12:24:55) [MSC v.1500 32 bit (Intel)] on win32  
Type "copyright", "credits" or "license()" for more information.  
================================ RESTART ================================  

Making slice  from 0 to 10 (sec)  
 Recognizing words from  0 to 10  
Google Speech Recognition thinks you said 0 1 2 3 4 5 6 7 8 9 10 11  

Making slice  from 10 to 20 (sec)  
 Recognizing words from  10 to 20  
Google Speech Recognition thinks you said 12 13 14 15 16 17 18 19 20 21  

Making slice  from 20 to 30 (sec)  
 Recognizing words from  20 to 30  
Google Speech Recognition thinks you said 21 22 23 24 25 26 27 28 29  

Making slice  from 30 to 40 (sec)  
 Recognizing words from  30 to 40  
Google Speech Recognition thinks you said 30 31 32 33 34 35 36 37 38  

Making slice  from 40 to 50 (sec)  
 Recognizing words from  40 to 50  
Google Speech Recognition thinks you said 39 40 41 42 43 44 45 46 47  

Making slice  from 50 to 60 (sec)  
 Recognizing words from  50 to 60  
Google Speech Recognition thinks you said 48 49 50 51 52 53 54 55 56  

Making slice  from 60 to 70 (sec)  
 Recognizing words from  60 to 70  
Google Speech Recognition thinks you said 57 58 59 60 61 62 63 64 65  

Making slice  from 70 to 80 (sec)  
 Recognizing words from  70 to 80  
Google Speech Recognition thinks you said 66 67 68 69 70 71 72 73 74  

Making slice  from 80 to 90 (sec)  
 Recognizing words from  80 to 90  
Google Speech Recognition thinks you said 75 76 77 78 79 80 81 82 83  

Making slice  from 90 to 100 (sec)  
 Recognizing words from  90 to 100  
Google Speech Recognition thinks you said 84 85 86 87 88 89 90 91 92  

Making slice  from 100 to 110 (sec)  
 Recognizing words from  100 to 110  
Google Speech Recognition thinks you said 93 94 95 96 97 98 99 100

如何在python中的特定时间获得音频的频率？

1 个答案:

设置

程序

输出