。使用pocketsphinx提高语音到文本识别准确率的用途是什么?

时间:2016-01-04 09:12:36

标签: pocketsphinx-android

我对使用pocketsphinx完全不熟悉,我已经按照

中提到的演示应用程序的集成进行了操作

Android offline voice recognition using PocketSphinx

在我的应用程序中将pocketsphinx集成为库之后,它工作正常,但输出并不如预期的那么准确。它甚至会从所提供的dictonary中删除所说的单词。

我想了解,如何提高检测单词的准确性:我最初使用过.lm文件;然后我没有使用它,而是简单地创建了一个.jsgf文本文件并使用它,但仍然没有提高准确性,所以在使用.jsgf文件之后我需要编译它或者只是复制粘贴.jsgf文本在assests文件中的文件就足够了

http://cmusphinx.sourceforge.net/wiki/tutorialandroid在这个链接中给出了构建pocketsphinx-android。我没有这样做。只是将其整合为图书馆项目

守则:

public class SphinxSpeechRecognizerActivity extends Activity implements RecognitionListener {

    private static String TAG = SphinxSpeechRecognizerActivity.class.getSimpleName();

    private SpeechRecognizer mRecognizer;
    private HashMap<String, Integer> mCaptions;

//    private static final String KWS_SEARCH = "wakeup";
//    private static final String KEYPHRASE = "phone";
    private static final String COMMANDS = "command";
    private boolean mErrorFlag = false;
    private static boolean isRecognizerInProgress = false;

    @Override
    public void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.fragment);
        initViews();


    }

    @Override
    public void onResume() {
        super.onResume();
    }

    @Override
    public void onPause() {
        super.onPause();
    }


    public void onDestroy() {
        super.onDestroy();
        Log.d(TAG, "** onDestroy **");
        stopRecgonizer(true);

    }

    @Override
    public void onBackPressed() {
        super.onBackPressed();
        stopRecgonizer(true);
    }

    private void initViews() {
        final ImageView img_close = (ImageView)findViewById(R.id.ttsClose);
        final ImageView img_voice_view = (ImageView)findViewById(R.id.tts_voice_view);
        final ImageView img_info = (ImageView)findViewById(R.id.ttsInfo);

        img_close.setOnClickListener(mOnClickListener);
        img_info.setOnClickListener(mOnClickListener);
        img_voice_view.setOnClickListener(mOnClickListener);
    }

    // Set press indicator
    private View.OnClickListener mOnClickListener = new View.OnClickListener() {
        @Override
        public void onClick(View v) {

            switch (v.getId()){
                case R.id.ttsInfo:
                    break;

                case R.id.tts_voice_view:
                    if (!isRecognizerInProgress) {
                        isRecognizerInProgress = true;
                        setupRecognizerController();
                    } else {
                        Log.d(TAG, "Sphinx recognizer is already running");
                    }
                    break;

                case R.id.ttsClose:
                default:
                    // Call back event
                    onBackPressed();
                    break;
            }

        }
    };

    @Override
    public void onBeginningOfSpeech() {
        Log.d(TAG, "** onBeginningOfSpeech **" + mErrorFlag);
    }

    @Override
    public void onEndOfSpeech() {
        Log.d(TAG, "** onEndOfSpeech **");
        mRecognizer.stop();
    }

    @Override
    public void onPartialResult(Hypothesis hypothesis) {
        Log.d(TAG, "** onPartialResult **");

        if (hypothesis == null)
            return;
        mRecognizer.stop();
    }

    private void switchSearch(String languageModelSearch) {
        mRecognizer.stop();
        mRecognizer.startListening(languageModelSearch, 2000);
    }


    @Override
    public void onResult(Hypothesis hypothesis) {
        hideListeningBackground();
        stopRecgonizer(true);

        if(hypothesis != null){
            final String recognizedCommand = hypothesis.getHypstr();
            Log.d(TAG,"Recognized Text: = " + recognizedCommand + " Score: " + hypothesis.getBestScore());

            runOnUiThread(new Runnable() {
                @Override
                public void run() {
                    if(!recognizedCommand.equals("")) {
                        if (recognizedCommand.equalsIgnoreCase(<given_command>)) {
                            Intent speech_converted_intent = new Intent(SphinxSpeechRecognizerActivity.this, Subclass.class);
                            startActivity(speech_converted_intent);
                            finish();
                        }
                    } else {
                        showErrorMsg(Constants.MODE_SUCCESS);
                    }
                }
            });

        } else {
            showErrorMsg(Constants.MODE_DEFAULT);
        }
    }

    @Override
    public void onError(Exception e) {
        Log.e(TAG, "** onError **");
        showErrorMsg(Constants.MODE_FAILED);
    }

    @Override
    public void onTimeout() {
        Log.i(TAG, "** onTimeout **");
        mRecognizer.stop();
    }


    private void setupRecognizerController() {

        new AsyncTask<Void, Void, Exception>() {
            @Override
            protected Exception doInBackground(Void... params) {
                try {
                    Assets assets = new Assets(SphinxSpeechRecognizerActivity.this);
                    File assetDir = assets.syncAssets();
                    setupRecognizer(assetDir);
                } catch (IOException e) {
                    return e;
                }
                return null;
            }

            @Override
            protected void onPostExecute(Exception result) {
                if(result == null){
                    Log.d(TAG, "Sphinx Recognizer: Start");
                    mRecognizer.startListening(COMMANDS, 3000);
                }
                displayListeningBackground();

            }
        }.execute();
    }

    private void setupRecognizer(File assetsDir) throws IOException {
        mRecognizer = defaultSetup()
                .setAcousticModel(new File(assetsDir, "en-us-ptm"))
                .setDictionary(new File(assetsDir, "cmudict-en-us.dict"))
                .setKeywordThreshold(1e-10f)
                .setFloat("-beam", 1e-30f)
                .setBoolean("-allphone_ci", true)

                .getRecognizer();
        mRecognizer.addListener(this);

        File languageModel = new File(assetsDir, "command.gram");
        mRecognizer.addGrammarSearch(COMMANDS, languageModel);
 //       reset();
    }


    private void reset(){
        mRecognizer.stop();
   //     mRecognizer.startListening(COMMANDS);
    }

    private void stopRecgonizer(boolean flag){
        if(flag && mRecognizer != null){
            mRecognizer.cancel();
            mRecognizer.shutdown();
            isRecognizerInProgress = false;
        }
        hideListeningBackground();
    }

    String mShowText = "ERROR";
    private void showErrorMsg(final int error_type) {

        runOnUiThread(new Runnable() {
            @Override
            public void run() {
                switch (error_type) {
                    case Constants.MODE_FAILED:
                        // ...
                        break;
                    case Constants.MODE_SUCCESS:
                        //...
                        break;
                    case Constants.MODE_DEFAULT:
                    default:
                        //../
                        break;
                }
            }
        });
    }
}

我的语法文件

#JSGF V1.0;

grammar commands;

public <commands> = (<label> | <mainMenu> | <subMenu> | <track> )+;

<mainMenu> = ( music
         | phone
         | navigation 
         | vehicle 
         | homepage
         | shortcut
         );

<label> =  ( back
                  | usb ( one | two )
                  | contact
                  | sms
                  | message
                  | dial
                  | ( homepage ( one | two | three ))
                  | ( shortcut ( one | two | three ))
                  );

<subMenu> = ( back
            | ( next | previous ) station
            | ( fm ( one | two ))
            | ( dr ( one | two ))
            | am
            | listen
            | play
            | ( next | previous )
            | search [ artists | playlists | songs | albums ]
            | call
            | received
            | missed
            | dial
            | address
            );

<track> = ( one
             | two
             | three
             | four
             | five
             | six
             | seven
             | eight
             | nine
             | ten
             | eleven
             | twelve
             | thirteen
             | fourteen
             | fifteen
             | sixteen
             | seventeen
             | eighteen
             | nineteen
             | twenty
             | (twenty ( one
                       | two
                       | three
                       | four
                       | five
                       | six
                       | seven
                       | eight
                       | nine
                       )
                )
             | thirty
             | (thirty ( one
                       | two
                       | three
                       | four
                       | five
                       | six
                       | seven
                       | eight
                       | nine
                       )
                )
             | forty
             | (forty ( one
                      | two
                      | three
                      | four
                      | five
                      | six
                      | seven
                      | eight
                      | nine
                      )
                )
             | fifty
             | (fifty ( one
                      | two
                      | three
                      | four
                      | five
                      | six
                      | seven
                      | eight
                      | nine
                      )
                )
             | sixty
             | (sixty ( one
                      | two
                      | three
                      | four
                      | five
                      | six
                      | seven
                      | eight
                      | nine
                      )
                )
             | seventy
             | (seventy ( one
                        | two
                        | three
                        | four
                        | five
                        | six
                        | seven
                        | eight
                        | nine
                        )
                )
             | eighty
             | (eighty   ( one
                         | two
                         | three
                         | four
                         | five
                         | six
                         | seven
                         | eight
                         | nine
                         )
                )
             | ninety
             | (ninety ( one
                       | two
                       | three
                       | four
                       | five
                       | six
                       | seven
                       | eight
                       | nine
                       )
               )
            );

我的日志显示:

I/cmusphinx: INFO: pocketsphinx.c(993): Writing raw audio log file: /storage/emulated/0/Android/data/com.techmahindra.rngo/files/sync/000000000.raw

1 个答案:

答案 0 :(得分:0)

准确度调试是一个复杂的过程,可能存在太多问题 - 数据噪声,CPU速度差导致录制延迟,信道估计错误。

为了调试性能,首先需要收集数据。取消注释演示中对setRawLogDir的调用,并在logcat中查看原始数据文件存储在SD卡上。检查这些文件以确保正确记录音频。与日志和模型共享数据以获得准确性方面的帮助。确保数据被正确记录,没有噪音,格式正确,没有重音。

如果你想连续听并忽略不感兴趣的单词,你需要使用关键词定位模式,而不是语言模型或语法。