我试图找到一种简单的方法将MP3发送给Google进行语音识别。目前,我使用子进程调用SoX,将其转换为WAV。然后,使用SpeechRecognition,它再次将其转换为FLAC。理想情况下,我喜欢更便携(不是特定于操作系统)的方式解码MP3并发送它没有中间文件保存等。
这是我目前所拥有的:
import speech_recognition as sr
import subprocess
import requests
audio = requests.get('http://somesite.com/some.mp3')
with open('/tmp/audio.mp3', 'wb') as file:
file.write(audio.content)
subprocess.run(['sox', '/tmp/audio.mp3', '/tmp/audio.wav'])
r = sr.Recognizer()
with sr.WavFile('/tmp/audio.wav') as source:
audio = r.record(source)
result = r.recognize_google(audio)
del r
我已尝试直接使用SpeechRecognition中包含的FLAC二进制文件,但输出只是静态的。我不太热衷于在Git上发布二进制文件,但如果这是唯一的方法我会的。
一些重要的链接:
SR's code for speech recognition
修改
我考虑像FLAC二进制文件那样分发SoX,每个操作系统一个,如果SoX的许可证允许的话......
第二个想法,软件许可证令人困惑,我不想搞砸它。
答案 0 :(得分:0)
我决定这样做:
public static void Main (string[] args)
{
// Note: This crashes if non numeric characters are entered!
Console.WriteLine ("Please enter 3 numbers:");
int num1 = Convert.ToInt32(Console.ReadLine());
int num2 = Convert.ToInt32(Console.ReadLine());
int divisor = Convert.ToInt32(Console.ReadLine());
// Find the lowest and highest in case they are entered in the wrong order
int lowerNum = Math.Min(num1, num2);
int upperNum = Math.Max(num1, num2);
// Find the first factor over the lower bound
// E.g. for a = 10, b = 20, c = 3, we have remainder = 1
// = 10 + (3 - 1)
// = 12
int remainder = lowerNum % divisor;
int factor = (remainder == 0)
? lowerNum
: lowerNum + (divisor - remainder);
// Calculate all other factors up to the upper bound by simple addition
while(factor <= upperNum){
Console.WriteLine(factor);
factor += divisor;
}
}
这更像是一个中间立场,我想从SR模块中借用一些东西。它需要用户安装SoX,但 应该在所有操作系统上工作,并且没有任何中间文件。我只在Linux上测试过它。