Question

我正在使用Sphinx4与文本对齐。我想得到句子中每个单词的时间（开始，结束）以及单词中每个音素的时间。为此，我更改了SpeechAligner的代码。我编辑的方法是：

public List<WordResult> align(URL audioUrl, List<String> sentenceTranscript) throws IOException {...}

我刚刚添加了一个列表，我在Result类中得到了结果（而不是WordResult）。

List<WordResult> hypothesis = new ArrayList<WordResult>();
            Result result;
            while (null != (result = recognizer.recognize())) {

                alignResult.add(result);// I am filling the results here

                logger.info("Utterance result " + result.getTimedBestResult(true));
                hypothesis.addAll(result.getTimedBestResult(false));
            }

然后我完全按照这个例子： Phonemes Timestamp

对于这句话：＆＃34; des adversaires＆＃34; 我期待着： expected result

但结果是将1个单词转移到开头，将单词＆＃34; des＆＃34;的拼写转换为，并且des采用＆＃34; adversaires＆＃34;的拼写。等等（好像忽略了第二个沉默）。我得到这个： what i get

显示令牌和我使用的单位：

System.out.println("token : " + token.getWordPath() + " - unit : " + unit.toString());

提前致谢，

Answer 1

sphinx4中有两种类型的语言学家 - FlatLingust在实际音素检测器之前附加单位标记，lextree语言学家将其追加。有一种情况可以在sphinx4的Result类中处理它：

   if (wordTokenFirst) {
        return getTimedWordPath(token, withFillers);
    } else {
        return getTimedWordTokenLastPath(token, withFillers);
    }

维基页面上的代码是为lextree语言学家提供的，其中包含检测器令牌之后的单位令牌。对齐器之前使用具有单位标记的FlatLinguist。因此，您必须相应地重构wiki中的示例代码。这不是一件非常琐碎的事情。

Sphinx4令牌单位转移

1 个答案: