是否有一款软件可以自动加扰/模糊部分音频文件?

时间:2016-04-07 18:31:36

标签: audio speech-recognition scramble

这是我在这里发布的第二个问题,所以如果我做错了,请告诉我。

我今天面临一个有趣的问题。我在呼叫中心工作,我公司的一个客户验证信息,希望从客户那里收集银行账号,并希望我们的客户服务代理将所述银行账号输入客户外部网站。

这些银行帐号不会保存在我们本地数据库的任何位置,但我们的CSR收集银行帐号的音频将保存在我们的系统中。纯文本将不可用,但声音文件将可用。我的问题是,是否有办法使用程序自动加扰录制的某个部分?我知道这是在黑暗中的严重镜头。谢谢。

1 个答案:

答案 0 :(得分:1)

虽然你的问题没有要求特定的编程相关问题,但我会尝试回答它,因为我正在处理类似的事情。

我们可以使用程序自动自动加扰录制的某个部分吗? 我们当然可以。这将取决于你想要做多少复杂。

虽然有复杂的方法,但从非常基本的概念角度来看,我们需要在以下阶段中记录录制的音频文件和过程。

  1. 音频文件中的分词:这需要静音识别
    之间的话。
  2. 通过语音识别系统传递每个单词
  3. 想出一个争抢的方法。你想沉默, jumble,请填写white noiseencode
  4. 将已识别的字词与您想要的字词scramble进行比较, 如果根据所选方法匹配scramble
  5. 以正确的顺序组合(concatenate)所有单词并存储 它。
  6. 除了(4)之外,我已经将上面的基本原型放在一起。 该程序大量使用pydub,它提供了更简单的操作音频的方法。可以找到关于它的教程here

    该程序基本上,

    1)我将数字0下载的开源预录wav文件from this site下载到10并使用pydub连接起来。
    程序以一秒的块为单位切片给出音频文件。我已使用audacity分隔每个单词,以便它们适合一秒钟的窗口。在现实生活中,情况并非如此。

    2)然后通过google speech recognition engine传递单词并显示已识别的单词。正如您将看到单词six无法正确识别。为此,您需要一个强大的speech recognition engine

    3)该程序提供三种不同的scramble方法。

    • a)颠倒这个词
    • b)用等效的white noise
    • 替换单词
    • c)用silence
    • 替换单词

    4)然后选择三个单词942并应用上面scramble方法并替换相应的单词文件

    5)然后将所有单词与正确顺序的加扰单词连接起来并创建输出文件。

    注意:我没有足够的时间来添加单词与争夺和识别单词之间的比较。

    如果有任何问题,请告诉我。

    ****演示代码:****

    """ Declarations """ 
    import speech_recognition as sr
    from pydub import AudioSegment
    from pydub.silence import split_on_silence
    from pydub.generators import WhiteNoise
    from pydub.playback import play
    
    
    
    """ Function for Speech Recognition """ 
    def processAudio(WAV_FILE):
        r = sr.Recognizer()
        with sr.WavFile(WAV_FILE) as source:
            audio = r.record(source) # read the entire WAV file
    
        # recognize speech using Google Speech Recognition
        try:  
            print("recognizedWord=" + r.recognize_google(audio))
        except sr.UnknownValueError:
            print("Could not understand audio")
        except sr.RequestError as e:
            print("Could not request results from GSR; {0}".format(e))
    
    """ Function to scramble word based upon choice """ 
    def scramble_audio(aWord, option):
        scramble_file = export_path + "slice" + str(aWord) +".wav"
        scramble_audioseg = AudioSegment.from_wav(scramble_file)    
        aWord_length = scramble_audioseg.__len__() #Get length of word segment to scramble
    
        if option == "reverse":     #Reverse word to scramble
            scrambled_word = scramble_audioseg.reverse()        
    
        elif option == "whiteNoise":    #Replace word to scramble with white noise     
            wn = WhiteNoise()           #Instantiate White Noise Object         
            aWord_length = scramble_audioseg.__len__()              #Get length of word segment
            scrambled_word = wn.to_audio_segment(duration=aWord_length) #Create audio_segment
    
        elif option == "silence":               #Replace word to scramble with silence
            scrambled_word =  AudioSegment.silent(duration=aWord_length) 
    
        print ("Scrambling and Exporting %s" % scramble_file)
        scrambled_word.export(scramble_file, format="wav") #Export merged audio file
    
    
    if __name__ == "__main__":
    
        export_path = ".//splitAudio//"
        in_audio_file = "0-10.wav"
        out_audio_file = export_path + "scrambledAudio.wav"
    
        #Read main audio file to be processed. Assuming in the same folder as this script
        sound = AudioSegment.from_wav(in_audio_file)
    
        sec2_splice = 1  #Splice threshold in sec
    
        audio_length = len(sound) # Total Audio Length In millisec
    
        q, r = divmod(audio_length, sec2_splice) #Get quotient and remainder 
    
        #Get total segments and rounds to next greater integer 
        total_segments=  (q + int(bool(r)) ) / 1000  #Converting to sec
    
        #Iterate through slices every one second and export
        print ("")
        n=0
        while n <= total_segments:
            print ("Making slice  from %d to %d  (sec)" % (n , sec2_splice ))    
            temp_object = sound[ (n * 1000) : (sec2_splice * 1000)] #Slicing is done in millisec
            myaudio_file = export_path + "slice" + str(n) +".wav"
            temp_object.export(myaudio_file , format="wav") 
            print ("Trying to recognize %d " %n)
            processAudio(myaudio_file)   
            n = sec2_splice
            sec2_splice += 1    
    
    
        #Scramble desired audio slice
        print ("")
        scramble_word = 9
        scramble_audio(scramble_word, "reverse" )
    
        scramble_word = 4
        scramble_audio(scramble_word, "whiteNoise" )
    
        scramble_word = 2
        scramble_audio(scramble_word, "silence" )
        #Combine modified audio
    
        final_audio = AudioSegment.empty()  #Create empty  AudioSegment
        print ("")
        i = 0
        while i <= total_segments:
            temp_audio_file = export_path + "slice" + str(i) +".wav"
            temp_audio_seg = AudioSegment.from_wav(temp_audio_file)
            print ("Combining %s"  % temp_audio_file )
            final_audio = final_audio.append(temp_audio_seg, crossfade=0)
            i += 1
    
        print ("Exporting final audio %s"  % out_audio_file )
        final_audio.export(out_audio_file , format="wav")
    

    输出

    Python 2.7.9 (default, Dec 10 2014, 12:24:55) [MSC v.1500 32 bit (Intel)] on win32
    Type "copyright", "credits" or "license()" for more information.
    >>> ================================ RESTART ================================
    >>> 
    
    Making slice  from 0 to 1  (sec)
    Trying to recognize 0 
    recognizedWord=0
    Making slice  from 1 to 2  (sec)
    Trying to recognize 1 
    recognizedWord=1
    Making slice  from 2 to 3  (sec)
    Trying to recognize 2 
    Could not understand audio
    Making slice  from 3 to 4  (sec)
    Trying to recognize 3 
    recognizedWord=3
    Making slice  from 4 to 5  (sec)
    Trying to recognize 4 
    recognizedWord=4
    Making slice  from 5 to 6  (sec)
    Trying to recognize 5 
    recognizedWord=5
    Making slice  from 6 to 7  (sec)
    Trying to recognize 6 
    recognizedWord=sex
    Making slice  from 7 to 8  (sec)
    Trying to recognize 7 
    recognizedWord=7
    Making slice  from 8 to 9  (sec)
    Trying to recognize 8 
    recognizedWord=8
    Making slice  from 9 to 10  (sec)
    Trying to recognize 9 
    recognizedWord=9
    Making slice  from 10 to 11  (sec)
    Trying to recognize 10 
    recognizedWord=10
    
    Scrambling and Exporting .//splitAudio//slice9.wav
    Scrambling and Exporting .//splitAudio//slice4.wav
    Scrambling and Exporting .//splitAudio//slice2.wav
    
    Combining .//splitAudio//slice0.wav
    Combining .//splitAudio//slice1.wav
    Combining .//splitAudio//slice2.wav
    Combining .//splitAudio//slice3.wav
    Combining .//splitAudio//slice4.wav
    Combining .//splitAudio//slice5.wav
    Combining .//splitAudio//slice6.wav
    Combining .//splitAudio//slice7.wav
    Combining .//splitAudio//slice8.wav
    Combining .//splitAudio//slice9.wav
    Combining .//splitAudio//slice10.wav
    Exporting final audio .//splitAudio//scrambledAudio.wav
    >>>