在KenLM python包装器中使用会标模型

时间:2018-08-09 19:22:28

标签: python nlp language-model kenlm

我正在尝试使用unigram arpa文件在python包装器中构建kenlm Model。但是,我收到以下错误:

Loading the LM will be faster if you build a binary file.
Reading /home/ubuntu/lm_1b/lm_1b/preprocessed_data/lm1b-1gram.tsv
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Traceback (most recent call last):
  File "kenlm.pyx", line 119, in kenlm.Model.__init__ (python/kenlm.cpp:2603)
RuntimeError: lm/model.cc:100 in void lm::ngram::detail::GenericModel<Search, VocabularyT>::InitializeFromARPA(int, const char*, const lm::ngram::Config&) [with Search = lm::ngram::detail::HashedSearch<lm::ngram::BackoffValue>; VocabularyT = lm::ngram::ProbingVocabulary] threw FormatLoadException.
This ngram implementation assumes at least a bigram model. Byte: 25

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "process_experiment.py", line 45, in <module>
    create_logprob_corpus_vectors.create(tokenized_line_file, logprob_file)
  File "/home/ubuntu/lm_1b/lm_1b/create_probabilities_from_raw_data/create_logprob_corpus_vectors.py", line 37, in create
    klm_ngram_model = kenlm.Model(op.join(filenames.preproc_dir, 'lm1b-1gram.tsv'))
  File "kenlm.pyx", line 122, in kenlm.Model.__init__ (python/kenlm.cpp:2740)
OSError: Cannot read model '/home/ubuntu/lm_1b/lm_1b/preprocessed_data/lm1b-1gram.tsv' (lm/model.cc:100 in void lm::ngram::detail::GenericModel<Search, VocabularyT>::InitializeFromARPA(int, const char*, const lm::ngram::Config&) [with Search = lm::ngram::detail::HashedSearch<lm::ngram::BackoffValue>; VocabularyT = lm::ngram::ProbingVocabulary] threw FormatLoadException. This ngram implementation assumes at least a bigram model. Byte: 25)

如何使用会标模型?

0 个答案:

没有答案