我对使用pocketsphinx完全不熟悉,我已经按照
中提到的演示应用程序的集成进行了操作Android offline voice recognition using PocketSphinx
在我的应用程序中将pocketsphinx集成为库之后,它工作正常,但输出并不如预期的那么准确。它甚至会从所提供的dictonary中删除所说的单词。
我想了解,如何提高检测单词的准确性:我最初使用过.lm文件;然后我没有使用它,而是简单地创建了一个.jsgf文本文件并使用它,但仍然没有提高准确性,所以在使用.jsgf文件之后我需要编译它或者只是复制粘贴.jsgf文本在assests文件中的文件就足够了
http://cmusphinx.sourceforge.net/wiki/tutorialandroid在这个链接中给出了构建pocketsphinx-android。我没有这样做。只是将其整合为图书馆项目
守则:
public class SphinxSpeechRecognizerActivity extends Activity implements RecognitionListener {
private static String TAG = SphinxSpeechRecognizerActivity.class.getSimpleName();
private SpeechRecognizer mRecognizer;
private HashMap<String, Integer> mCaptions;
// private static final String KWS_SEARCH = "wakeup";
// private static final String KEYPHRASE = "phone";
private static final String COMMANDS = "command";
private boolean mErrorFlag = false;
private static boolean isRecognizerInProgress = false;
@Override
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.fragment);
initViews();
}
@Override
public void onResume() {
super.onResume();
}
@Override
public void onPause() {
super.onPause();
}
public void onDestroy() {
super.onDestroy();
Log.d(TAG, "** onDestroy **");
stopRecgonizer(true);
}
@Override
public void onBackPressed() {
super.onBackPressed();
stopRecgonizer(true);
}
private void initViews() {
final ImageView img_close = (ImageView)findViewById(R.id.ttsClose);
final ImageView img_voice_view = (ImageView)findViewById(R.id.tts_voice_view);
final ImageView img_info = (ImageView)findViewById(R.id.ttsInfo);
img_close.setOnClickListener(mOnClickListener);
img_info.setOnClickListener(mOnClickListener);
img_voice_view.setOnClickListener(mOnClickListener);
}
// Set press indicator
private View.OnClickListener mOnClickListener = new View.OnClickListener() {
@Override
public void onClick(View v) {
switch (v.getId()){
case R.id.ttsInfo:
break;
case R.id.tts_voice_view:
if (!isRecognizerInProgress) {
isRecognizerInProgress = true;
setupRecognizerController();
} else {
Log.d(TAG, "Sphinx recognizer is already running");
}
break;
case R.id.ttsClose:
default:
// Call back event
onBackPressed();
break;
}
}
};
@Override
public void onBeginningOfSpeech() {
Log.d(TAG, "** onBeginningOfSpeech **" + mErrorFlag);
}
@Override
public void onEndOfSpeech() {
Log.d(TAG, "** onEndOfSpeech **");
mRecognizer.stop();
}
@Override
public void onPartialResult(Hypothesis hypothesis) {
Log.d(TAG, "** onPartialResult **");
if (hypothesis == null)
return;
mRecognizer.stop();
}
private void switchSearch(String languageModelSearch) {
mRecognizer.stop();
mRecognizer.startListening(languageModelSearch, 2000);
}
@Override
public void onResult(Hypothesis hypothesis) {
hideListeningBackground();
stopRecgonizer(true);
if(hypothesis != null){
final String recognizedCommand = hypothesis.getHypstr();
Log.d(TAG,"Recognized Text: = " + recognizedCommand + " Score: " + hypothesis.getBestScore());
runOnUiThread(new Runnable() {
@Override
public void run() {
if(!recognizedCommand.equals("")) {
if (recognizedCommand.equalsIgnoreCase(<given_command>)) {
Intent speech_converted_intent = new Intent(SphinxSpeechRecognizerActivity.this, Subclass.class);
startActivity(speech_converted_intent);
finish();
}
} else {
showErrorMsg(Constants.MODE_SUCCESS);
}
}
});
} else {
showErrorMsg(Constants.MODE_DEFAULT);
}
}
@Override
public void onError(Exception e) {
Log.e(TAG, "** onError **");
showErrorMsg(Constants.MODE_FAILED);
}
@Override
public void onTimeout() {
Log.i(TAG, "** onTimeout **");
mRecognizer.stop();
}
private void setupRecognizerController() {
new AsyncTask<Void, Void, Exception>() {
@Override
protected Exception doInBackground(Void... params) {
try {
Assets assets = new Assets(SphinxSpeechRecognizerActivity.this);
File assetDir = assets.syncAssets();
setupRecognizer(assetDir);
} catch (IOException e) {
return e;
}
return null;
}
@Override
protected void onPostExecute(Exception result) {
if(result == null){
Log.d(TAG, "Sphinx Recognizer: Start");
mRecognizer.startListening(COMMANDS, 3000);
}
displayListeningBackground();
}
}.execute();
}
private void setupRecognizer(File assetsDir) throws IOException {
mRecognizer = defaultSetup()
.setAcousticModel(new File(assetsDir, "en-us-ptm"))
.setDictionary(new File(assetsDir, "cmudict-en-us.dict"))
.setKeywordThreshold(1e-10f)
.setFloat("-beam", 1e-30f)
.setBoolean("-allphone_ci", true)
.getRecognizer();
mRecognizer.addListener(this);
File languageModel = new File(assetsDir, "command.gram");
mRecognizer.addGrammarSearch(COMMANDS, languageModel);
// reset();
}
private void reset(){
mRecognizer.stop();
// mRecognizer.startListening(COMMANDS);
}
private void stopRecgonizer(boolean flag){
if(flag && mRecognizer != null){
mRecognizer.cancel();
mRecognizer.shutdown();
isRecognizerInProgress = false;
}
hideListeningBackground();
}
String mShowText = "ERROR";
private void showErrorMsg(final int error_type) {
runOnUiThread(new Runnable() {
@Override
public void run() {
switch (error_type) {
case Constants.MODE_FAILED:
// ...
break;
case Constants.MODE_SUCCESS:
//...
break;
case Constants.MODE_DEFAULT:
default:
//../
break;
}
}
});
}
}
#JSGF V1.0;
grammar commands;
public <commands> = (<label> | <mainMenu> | <subMenu> | <track> )+;
<mainMenu> = ( music
| phone
| navigation
| vehicle
| homepage
| shortcut
);
<label> = ( back
| usb ( one | two )
| contact
| sms
| message
| dial
| ( homepage ( one | two | three ))
| ( shortcut ( one | two | three ))
);
<subMenu> = ( back
| ( next | previous ) station
| ( fm ( one | two ))
| ( dr ( one | two ))
| am
| listen
| play
| ( next | previous )
| search [ artists | playlists | songs | albums ]
| call
| received
| missed
| dial
| address
);
<track> = ( one
| two
| three
| four
| five
| six
| seven
| eight
| nine
| ten
| eleven
| twelve
| thirteen
| fourteen
| fifteen
| sixteen
| seventeen
| eighteen
| nineteen
| twenty
| (twenty ( one
| two
| three
| four
| five
| six
| seven
| eight
| nine
)
)
| thirty
| (thirty ( one
| two
| three
| four
| five
| six
| seven
| eight
| nine
)
)
| forty
| (forty ( one
| two
| three
| four
| five
| six
| seven
| eight
| nine
)
)
| fifty
| (fifty ( one
| two
| three
| four
| five
| six
| seven
| eight
| nine
)
)
| sixty
| (sixty ( one
| two
| three
| four
| five
| six
| seven
| eight
| nine
)
)
| seventy
| (seventy ( one
| two
| three
| four
| five
| six
| seven
| eight
| nine
)
)
| eighty
| (eighty ( one
| two
| three
| four
| five
| six
| seven
| eight
| nine
)
)
| ninety
| (ninety ( one
| two
| three
| four
| five
| six
| seven
| eight
| nine
)
)
);
我的日志显示:
I/cmusphinx: INFO: pocketsphinx.c(993): Writing raw audio log file: /storage/emulated/0/Android/data/com.techmahindra.rngo/files/sync/000000000.raw
答案 0 :(得分:0)
准确度调试是一个复杂的过程,可能存在太多问题 - 数据噪声,CPU速度差导致录制延迟,信道估计错误。
为了调试性能,首先需要收集数据。取消注释演示中对setRawLogDir
的调用,并在logcat中查看原始数据文件存储在SD卡上。检查这些文件以确保正确记录音频。与日志和模型共享数据以获得准确性方面的帮助。确保数据被正确记录,没有噪音,格式正确,没有重音。
如果你想连续听并忽略不感兴趣的单词,你需要使用关键词定位模式,而不是语言模型或语法。