如何通过谷歌api启用流式语音文本?

时间:2018-06-12 14:02:20

标签: python-3.x speech-recognition google-speech-api

我编写了一个Python代码,它将Google语音调用为文本API,以将录制的音频转换为文本。它按照我的愿望工作。但我想将实时流媒体语音转换为文本。例如我们在Android设备上使用Google应用程序的方式,或iOS上的SIRI。

下面是代码,请让我知道我需要使用哪些函数库?或者我如何修改以下代码来完成我的任务。

dataFiles <- list.files(pattern="*.csv")
N <- length(dataFiles)
dataList <- vector("list",N)
j <- 1
paramlist = list()
for(i in dataFiles) {
  #read all of the csv files
  name = gsub("-",".",i)
  name = gsub(".csv","",name) 
  params = unlist(strsplit(name,split="."))[c(3,4)]
  paramlist[[j]]=params
  i = paste(".\\",i,sep="")
  assign(name,read.csv(i, header=T))

  #add to dataList
  dataList[[j]] = assign(name,read.csv(i, header=T))
  j = j+1 
}

这个脚本:

import os
import speech_recognition as sr
from tqdm import tqdm

with open("api-key.json") as f:
    GOOGLE_CLOUD_SPEECH_CREDENTIALS = f.read()

r = sr.Recognizer()
files = sorted(os.listdir('parts/'))

all_texts = []

for f in tqdm(files):
    name = "parts/" + f

    # Load audio file
    with sr.AudioFile(name) as source:
        audio = r.record(source)

    # Transcribe audio file
    text = r.recognize_google_cloud(audio, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS)
    all_texts.append(text)

transcript = ""
for i, t in enumerate(all_texts):
    total_seconds = i * 30
    # Cool shortcut from:
    # https://stackoverflow.com/questions/775049/python-time-seconds-to-hms
    # to get hours, minutes and seconds
    m, s = divmod(total_seconds, 60)
    h, m = divmod(m, 60)

    # Format time as h:m:s - 30 seconds of text
    transcript = transcript + "{:0>2d}:{:0>2d}:{:0>2d} {}\n".format(h, m, s, t)

print(transcript)

with open("transcript.txt", "w") as f:
    f.write(transcript)

我无法找到符合我要求的任何Google API文档。

我知道我们需要为演讲者编写代码来通过迈克讲话,但我不知道该怎么做,而且我非常肯定会有很多其他的必备条件。

请帮忙。

0 个答案:

没有答案