我正在使用IBM bluemix转录一些音频,我想使用API说话人识别。
我设置了这样的识别器:
private RecognizeOptions getRecognizeOptions() {
return new RecognizeOptions.Builder()
.continuous(true)
.contentType(ContentType.OPUS.toString())
//.model("en-US")
.model("en-US_BroadbandModel")
.timestamps(true)
.smartFormatting(true)
.interimResults(true)
.speakerLabels(true)
.build();
}
但是返回的JSON不包含扬声器标签。如何使用bluemix java API返回扬声器标签?
Android中的我的录音机看起来像这样:
private void recordMessage() {
//mic.setEnabled(false);
speechService = new SpeechToText();
speechService.setUsernameAndPassword("usr", "pwd");
if(listening != true) {
capture = new MicrophoneInputStream(true);
new Thread(new Runnable() {
@Override public void run() {
try {
speechService.recognizeUsingWebSocket(capture, getRecognizeOptions(), new MicrophoneRecognizeDelegate());
} catch (Exception e) {
showError(e);
}
}
}).start();
Log.v("TAG",getRecognizeOptions().toString());
listening = true;
Toast.makeText(MainActivity.this,"Listening....Click to Stop", Toast.LENGTH_LONG).show();
} else {
try {
capture.close();
listening = false;
Toast.makeText(MainActivity.this,"Stopped Listening....Click to Start", Toast.LENGTH_LONG).show();
} catch (Exception e) {
e.printStackTrace();
}
}
}
答案 0 :(得分:0)
根据您的示例,我编写了一个示例应用程序,并使扬声器标签工作。
确保您使用的是Java-SDK 4.2.1。在build.gradle
添加
compile 'com.ibm.watson.developer_cloud:java-sdk:4.2.1'
以下代码片段使用WebSockets,中间结果和发言人标签识别assets
文件夹中的WAV file。
RecognizeOptions options = new RecognizeOptions.Builder()
.contentType("audio/wav")
.model(SpeechModel.EN_US_NARROWBANDMODEL.getName())
.interimResults(true)
.speakerLabels(true)
.build();
SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("SPEECH-TO-TEXT-USERNAME", "SPEECH-TO-TEXT-PASSWORD");
InputStream audio = loadInputStreamFromAssetFile("speaker_label.wav");
service.recognizeUsingWebSocket(audio, options, new BaseRecognizeCallback() {
@Override
public void onTranscription(SpeechResults speechResults) {
Assert.assertNotNull(speechResults);
System.out.println(speechResults.getResults().get(0).getAlternatives().get(0).getTranscript());
System.out.println(speechResults.getSpeakerLabels());
}
});
loadInputStreamFromAssetFile()
的位置:
public static InputStream loadInputStreamFromAssetFile(String fileName){
AssetManager assetManager = getAssets(); // From Context
try {
InputStream is = assetManager.open(fileName);
return is;
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
申请日志:
I/System.out: so how are you doing these days
I/System.out: so how are you doing these days things are going very well glad to hear
I/System.out: so how are you doing these days things are going very well glad to hear I think I mentioned
I/System.out: so how are you doing these days things are going very well glad to hear I think I mentioned before that there's a company now that I'm
I/System.out: so how are you doing these days things are going very well glad to hear I think I mentioned before that there's a company now that I'm working with which is very much
I/System.out: so how are you doing these days things are going very well glad to hear I think I mentioned before that there's a company now that I'm working with which is very much just just myself and Chris now
I/System.out: so how are you doing these days things are going very well glad to hear I think I mentioned before that there's a company now that I'm working with which is very much just just myself and Chris now you had mentioned that %HESITATION okay
I/System.out: so how are you doing these days things are going very well glad to hear I think I mentioned before that there's a company now that I'm working with which is very much just just myself and Chris now you had mentioned that %HESITATION okay
I/System.out: [{
I/System.out: "confidence": 0.487,
I/System.out: "final": false,
I/System.out: "from": 0.03,
I/System.out: "speaker": 0,
I/System.out: "to": 0.34
I/System.out: }, {
I/System.out: "confidence": 0.487,
I/System.out: "final": false,
I/System.out: "from": 0.34,
I/System.out: "speaker": 0,
I/System.out: "to": 0.54
I/System.out: }, {
I/System.out: "confidence": 0.487,
I/System.out: "final": false,
I/System.out: "from": 0.54,
I/System.out: "speaker": 0,
I/System.out: "to": 0.63
I/System.out: }, {
...... blah blah blah
I/System.out: }, {
I/System.out: "confidence": 0.343,
I/System.out: "final": false,
I/System.out: "from": 13.39,
I/System.out: "speaker": 1,
I/System.out: "to": 13.84
I/System.out: }]