我正在尝试编写一个脚本来调用Watson语音到文本(STT)API,以便通过麦克风逐字实时地持续转录正在录制的语音。我读到这应该可以使用API的Websockets版本。
我有一个Python脚本应该可以在Linux上执行此操作(假设已安装依赖项),但是,它在Mac OS X上不起作用。
from ws4py.client.threadedclient import WebSocketClient
import base64, json, ssl, subprocess, threading, time
class SpeechToTextClient(WebSocketClient):
def __init__(self):
ws_url = "wss://stream.watsonplatform.net/speech-to-text/api/v1/recognize"
username = "your username"
password = "your password"
auth_string = "%s:%s" % (username, password)
base64string = base64.encodestring(auth_string).replace("\n", "")
self.listening = False
try:
WebSocketClient.__init__(self, ws_url,
headers=[("Authorization", "Basic %s" % base64string)])
self.connect()
except: print "Failed to open WebSocket."
def opened(self):
self.send('{"action": "start", "content-type": "audio/l16;rate=16000"}')
self.stream_audio_thread = threading.Thread(target=self.stream_audio)
self.stream_audio_thread.start()
def received_message(self, message):
message = json.loads(str(message))
if "state" in message:
if message["state"] == "listening":
self.listening = True
print "Message received: " + str(message)
def stream_audio(self):
while not self.listening:
time.sleep(0.1)
reccmd = ["arecord", "-f", "S16_LE", "-r", "16000", "-t", "raw"]
p = subprocess.Popen(reccmd, stdout=subprocess.PIPE)
while self.listening:
data = p.stdout.read(1024)
try: self.send(bytearray(data), binary=True)
except ssl.SSLError: pass
p.kill()
def close(self):
self.listening = False
self.stream_audio_thread.join()
WebSocketClient.close(self)
try:
stt_client = SpeechToTextClient()
raw_input()
finally:
stt_client.close()
理想情况下,我甚至不会在Python中这样做,但是R,这是我的母语,我将不得不将结果转移回处理。
有人能为我提供如何获得流式转录的解决方案吗?
答案 0 :(得分:0)
不确定这个答案是否正是您想要的,但听起来像是参数continuous
的问题。
如您所见,在Watson-developer-cloud中有lib Python SDK。
您可以安装:pip install watson-developer-cloud
import json
from os.path import join, dirname
from watson_developer_cloud import SpeechToTextV1
speech_to_text = SpeechToTextV1(
username='YOUR SERVICE USERNAME',
password='YOUR SERVICE PASSWORD',
x_watson_learning_opt_out=False
)
print(json.dumps(speech_to_text.models(), indent=2))
print(json.dumps(speech_to_text.get_model('en-US_BroadbandModel'), indent=2))
with open(join(dirname(__file__), '../resources/speech.wav'),
'rb') as audio_file:
data = json.dumps(speech_to_text.recognize(audio_file, content_type='audio/wav', timestamps=False, word_confidence=False, continuous=True), indent=2)
print(data)
Obs。:该服务返回array
个结果,每个话语一个。
在#L44行中,您可以使用params
,因此,对于连续转录,您需要使用参数continuous
并设置为true
上面的例子。
答案 1 :(得分:-1)
有关如何使用R执行此操作的一些很好的示例,请查看Ryan Anderson撰写的这些精彩博文。
Ryan与R和Watson API做了很多工作 - 他在blog上分享了很多他的知识。