为什么MITIE会陷入细分分类器?

时间:2017-08-22 16:52:51

标签: machine-learning nlp

我正在使用MITIE建立一个模型,其训练数据集为1,400个句子,长度在3到10个单词之间,与120个意图配对。我的模型训练卡在Part II: train segment classifier。我让它在终止前运行了14个小时。

我的计算机有2.4 GHz Intel Core i78 GB 1600 MHz DDR3segment classifier使用所有可用内存(大约7gb),最终依赖于压缩内存,并在最后一个会话结束时活动监视器显示已使用32gb27gb压缩。并且segment classifier从未完成。

我目前的输出如下:

INFO:rasa_nlu.model:Starting to train component nlp_mitie
INFO:rasa_nlu.model:Finished training component.
INFO:rasa_nlu.model:Starting to train component tokenizer_mitie
INFO:rasa_nlu.model:Finished training component.
INFO:rasa_nlu.model:Starting to train component ner_mitie
Training to recognize 20 labels: 'pet', 'room_number', 'broken_things', '@sys.ignore', 'climate', 'facility', 'gym', 'medicine', 'item', 'exercise_equipment
', 'service', 'number', 'electronic_device', 'charger', 'toiletries', 'time', 'date', 'facility_hours', 'cost_inquiry', 'tv channel'
Part I: train segmenter
words in dictionary: 200000
num features: 271

now do training
C:           20
epsilon:     0.01
num threads: 1
cache size:  5
max iterations: 2000
loss per missed segment:  3
C: 20   loss: 3         0.669591
C: 35   loss: 3         0.690058
C: 20   loss: 4.5       0.701754
C: 5   loss: 3  0.616959
C: 20   loss: 1.5       0.634503
C: 28.3003   loss: 5.74942      0.71345
C: 25.9529   loss: 5.72171      0.707602
C: 27.7407   loss: 5.97907      0.707602
C: 30.2561   loss: 5.61669      0.701754
C: 27.747   loss: 5.66612       0.710526
C: 28.9754   loss: 5.82319      0.707602
best C: 28.3003
best loss: 5.74942
num feats in chunker model: 4095
train: precision, recall, f1-score: 0.805851 0.885965 0.844011
Part I: elapsed time: 180 seconds.

Part II: train segment classifier
now do training
num training samples: 415

我理解这可能是由冗余标签引起的问题(如here所述);但是,我的所有标签都是独一无二的。我的理解是,训练不应该花这么长时间或使用这么多记忆。我见过其他人发布过类似的问题但尚未提供解决方案。导致这种高内存使用和疯狂训练时间的原因是什么?它是如何修复的?

0 个答案:

没有答案