我编写了一个Python代码,它将Google语音调用为文本API,以将录制的音频转换为文本。它按照我的愿望工作。但我想将实时流媒体语音转换为文本。例如我们在Android设备上使用Google应用程序的方式,或iOS上的SIRI。
下面是代码,请让我知道我需要使用哪些函数库?或者我如何修改以下代码来完成我的任务。
dataFiles <- list.files(pattern="*.csv")
N <- length(dataFiles)
dataList <- vector("list",N)
j <- 1
paramlist = list()
for(i in dataFiles) {
#read all of the csv files
name = gsub("-",".",i)
name = gsub(".csv","",name)
params = unlist(strsplit(name,split="."))[c(3,4)]
paramlist[[j]]=params
i = paste(".\\",i,sep="")
assign(name,read.csv(i, header=T))
#add to dataList
dataList[[j]] = assign(name,read.csv(i, header=T))
j = j+1
}
这个脚本:
import os
import speech_recognition as sr
from tqdm import tqdm
with open("api-key.json") as f:
GOOGLE_CLOUD_SPEECH_CREDENTIALS = f.read()
r = sr.Recognizer()
files = sorted(os.listdir('parts/'))
all_texts = []
for f in tqdm(files):
name = "parts/" + f
# Load audio file
with sr.AudioFile(name) as source:
audio = r.record(source)
# Transcribe audio file
text = r.recognize_google_cloud(audio, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS)
all_texts.append(text)
transcript = ""
for i, t in enumerate(all_texts):
total_seconds = i * 30
# Cool shortcut from:
# https://stackoverflow.com/questions/775049/python-time-seconds-to-hms
# to get hours, minutes and seconds
m, s = divmod(total_seconds, 60)
h, m = divmod(m, 60)
# Format time as h:m:s - 30 seconds of text
transcript = transcript + "{:0>2d}:{:0>2d}:{:0>2d} {}\n".format(h, m, s, t)
print(transcript)
with open("transcript.txt", "w") as f:
f.write(transcript)
我无法找到符合我要求的任何Google API文档。
我知道我们需要为演讲者编写代码来通过迈克讲话,但我不知道该怎么做,而且我非常肯定会有很多其他的必备条件。
请帮忙。