如何保存训练有素的NLTK POS-tagger

时间:2018-03-16 05:51:18

标签: python nlp nltk

我想知道如何保存训练有素的NLTK(Unigram)Tagger。我使用以下代码训练Portuguese UnigramTagger,具体取决于运行时可能需要一段时间的语料库,所以我想避免重新运行它。

import nltk
from nltk import mac_morpho

def get_unigram_tagger():
  p_train = 0.9
  tagged_sents = mac_morpho.tagged_sents()
  size = int(len(tagged_sents)*0.9)
  train_sents = tagged_sents[:size]
  test_sents = tagged_sents[size:]
  uni_tagger = nltk.UnigramTagger(train_sents)
  print "Test accuracy =", uni_tagger.evaluate(test_sents)
  return uni_tagger

所以我从这个函数得到uni_tagger,如果我再次运行程序,我必须重新计算它。也许我可以某种方式保存uni_tagger以便下次我只需要从文件中读取它(权重等)。

1 个答案:

答案 0 :(得分:1)

您可以使用类似pickle的东西将您的模型保存到磁盘。

//pModule.ts
export interface Params {
    messageId: number;
}

//pAug.ts
import { Params } from "./pModule";
declare module './pModule' {
    interface Params {
        nickname: string;
    }
}
//usage.ts
import {Params} from './pModule'
import './pAug'

let p : Params = {
    nickname: '',
    messageId: 0
}

您还可以使用sklearn替代pickle import nltk import pickle from nltk import mac_morpho def get_unigram_tagger(): p_train = 0.9 tagged_sents = mac_morpho.tagged_sents() size = int(len(tagged_sents)*0.9) train_sents = tagged_sents[:size] test_sents = tagged_sents[size:] uni_tagger = nltk.UnigramTagger(train_sents) print "Test accuracy =", uni_tagger.evaluate(test_sents) return uni_tagge tagger = unigram_tagger() s = pickle.dumps(tagger) model2 = pickle.loads(s) & (joblib.dump

joblib.load)

Sklearn声称,对于像模型阵列这样的大型numpy,joblib比pickle更有效。

您可以在这里阅读更多内容

http://scikit-learn.org/stable/modules/model_persistence.html

https://docs.python.org/3/library/pickle.html