我正在使用Speech API v2和PHP,这是一个代码:
$file_to_upload = array('myfile'=>'@'.$filename.'.flac');
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://www.google.com/speech-api/v2/recognize?output=json&lang=ru-RU&key=___my_api_key___");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Content-Type: audio/x-flac; rate=8000"));
curl_setopt($ch, CURLOPT_POSTFIELDS, $file_to_upload);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$result=curl_exec ($ch);
Google使用两个JSON对象进行响应,第一个是空的,第二个是我预期的有效响应。这导致解析和进一步处理的困难。请参阅HTTP转储:
我的POST请求:
POST /speech-api/v2/recognize?output=json&lang=ru-RU&key=___my_api_key___ HTTP/1.1
Host: www.google.com
Accept: */*
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36
Content-Length: 13123
Expect: 100-continue
Content-Type: audio/x-flac; rate=8000; boundary=----------------------------9641e899ac92
------------------------------9641e899ac92
Content-Disposition: form-data; name="myfile"; filename="/tmp/voice/1400157667.6440-in.wav.flac"
Content-Type: application/octet-stream
fLaC..."......e..\......! ..{..!y>..7..............................( ...reference libFLAC 1.2.1 20070917.
...encoded binary data...
------------------------------9641e899ac92--
具有重复认可结果的回复:
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
Content-Disposition: attachment
Cache-Control: no-transform
X-Content-Type-Options: nosniff
Pragma: no-cache
Date: Thu, 15 May 2014 12:41:09 GMT
Server: S3 v1.0
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Alternate-Protocol: 80:quic
Transfer-Encoding: chunked
e
{"result":[]} <--- first one
f8
{"result":[{"alternative":[{"transcript":"............","confidence":0.73531097},{"transcript":"................"},{"transcript":".............."},{"transcript":"................"},{"transcript":"............ .."}],"final":true}],"result_index":0} <--- second one
0
为什么会发生?当我使用API v1时,它只有唯一的响应。互联网上v2的其他例子也只有一个。
非常感谢。
答案 0 :(得分:0)
首先,请确保您使用的语言提供了“说话者区分”功能。例如,对于哥伦比亚的西班牙文,Google不提供说话者二字化功能,但是对于西班牙的西班牙文,它提供:
此外,有时需要对音频进行轻微更改,使用ffmpeg
可以实现:
ffmpeg -i input.wav -ac 1 -ab 128k -filter:a volume=0.9 -filter:a equalizer=f=4000:t=h:w=200:g=-2 output.wav