将mp4声音转换为python中的文本

时间:2017-01-07 18:50:59

标签: python audio wav mp4

我想将Facebook Messenger的录音转换为文本。 以下是使用Facebook API发送.mp4文件的示例: https://cdn.fbsbx.com/v/t59.3654-21/15720510_10211855778255994_5430581267814940672_n.mp4/audioclip-1484407992000-3392.mp4?oh=a78286aa96c9dea29e5d07854194801c&oe=587C3833

因此,此文件仅包含音频(不是视频),我想将其转换为文本。

此外,我想尽快做到这一点,因为我将在几乎实时的应用程序中使用生成的文本(即用户发送.mp4文件,脚本将其转换为文本并将其显示回来)。

我找到了这个例子https://github.com/Uberi/speech_recognition/blob/master/examples/audio_transcribe.py 这是我使用的代码:

import requests
import speech_recognition as sr

url = 'https://cdn.fbsbx.com/v/t59.3654-21/15720510_10211855778255994_5430581267814940672_n.mp4/audioclip-1484407992000-3392.mp4?oh=a78286aa96c9dea29e5d07854194801c&oe=587C3833'
r = requests.get(url)

with open("test.mp4", "wb") as handle:
    for data in r.iter_content():
        handle.write(data)

r = sr.Recognizer()
with sr.AudioFile('test.mp4') as source:
    audio = r.record(source)

command = r.recognize_google(audio)
print command

但是我收到了这个错误:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Asterios\Anaconda2\lib\site-packages\speech_recognition\__init__.py", line 200, in __enter__
    self.audio_reader = aifc.open(aiff_file, "rb")
  File "C:\Users\Asterios\Anaconda2\lib\aifc.py", line 952, in open
    return Aifc_read(f)
  File "C:\Users\Asterios\Anaconda2\lib\aifc.py", line 347, in __init__
    self.initfp(f)
  File "C:\Users\Asterios\Anaconda2\lib\aifc.py", line 298, in initfp
    chunk = Chunk(file)
  File "C:\Users\Asterios\Anaconda2\lib\chunk.py", line 63, in __init__
    raise EOFError
EOFError

有什么想法吗?

编辑:我想在pythonanywhere.com的免费计划中运行脚本,所以我不确定如何在那里安装像ffmpeg这样的工具。

编辑2:如果你运行上面的脚本用这个“http://www.wavsource.com/snds_2017-01-08_2348563217987237/people/men/about_time.wav”替换url并将'mp4'改为'wav',它就可以了。所以它肯定是文件格式的东西。

2 个答案:

答案 0 :(得分:9)

最后我找到了解决方案。我将它发布在这里,以防将来帮助某人。

幸运的是,pythonanywhere.com预装了avconv(avconv类似于ffmpeg)。

所以这里有一些有用的代码:

Ads

在免费计划中,import urllib2 import speech_recognition as sr import subprocess import os url = 'https://cdn.fbsbx.com/v/t59.3654-21/15720510_10211855778255994_5430581267814940672_n.mp4/audioclip-1484407992000-3392.mp4?oh=a78286aa96c9dea29e5d07854194801c&oe=587C3833' mp4file = urllib2.urlopen(url) with open("test.mp4", "wb") as handle: handle.write(mp4file.read()) cmdline = ['avconv', '-i', 'test.mp4', '-vn', '-f', 'wav', 'test.wav'] subprocess.call(cmdline) r = sr.Recognizer() with sr.AudioFile('test.wav') as source: audio = r.record(source) command = r.recognize_google(audio) print command os.remove("test.mp4") os.remove("test.wav") 不在pythonanywhere上的白名单上,因此我无法使用cdn.fbsbx.com下载内容。我联系了他们,他们在1-2小时内将域名添加到白名单中!

非常感谢并祝贺他们提供优质服务,即使我使用免费套餐。

答案 1 :(得分:1)

使用Python Video Converter https://github.com/senko/python-video-converter

import requests
import speech_recognition as sr
from converter import Converter

url = 'https://cdn.fbsbx.com/v/t59.3654-21/15720510_10211855778255994_5430581267814940672_n.mp4/audioclip-1484407992000-3392.mp4?oh=a78286aa96c9dea29e5d07854194801c&oe=587C3833'
r = requests.get(url)
c = Converter()

with open("/tmp/test.mp4", "wb") as handle:
for data in r.iter_content():
handle.write(data)

conv = c.convert('/tmp/test.mp4', '/tmp/test.wav', {
    'format': 'wav',
    'audio': {
    'codec': 'pcm',
    'samplerate': 44100,
    'channels': 2
    },
})

for timecode in conv:
    pass

r = sr.Recognizer()
with sr.AudioFile('/tmp/test.wav') as source:
audio = r.record(source)

command = r.recognize_google(audio)
print command