我正在使用MITIE
建立一个模型,其训练数据集为1,400个句子,长度在3到10个单词之间,与120个意图配对。我的模型训练卡在Part II: train segment classifier
。我让它在终止前运行了14个小时。
我的计算机有2.4 GHz Intel Core i7
和8 GB 1600 MHz DDR3
,segment classifier
使用所有可用内存(大约7gb),最终依赖于压缩内存,并在最后一个会话结束时活动监视器显示已使用32gb
和27gb
压缩。并且segment classifier
从未完成。
我目前的输出如下:
INFO:rasa_nlu.model:Starting to train component nlp_mitie
INFO:rasa_nlu.model:Finished training component.
INFO:rasa_nlu.model:Starting to train component tokenizer_mitie
INFO:rasa_nlu.model:Finished training component.
INFO:rasa_nlu.model:Starting to train component ner_mitie
Training to recognize 20 labels: 'pet', 'room_number', 'broken_things', '@sys.ignore', 'climate', 'facility', 'gym', 'medicine', 'item', 'exercise_equipment
', 'service', 'number', 'electronic_device', 'charger', 'toiletries', 'time', 'date', 'facility_hours', 'cost_inquiry', 'tv channel'
Part I: train segmenter
words in dictionary: 200000
num features: 271
now do training
C: 20
epsilon: 0.01
num threads: 1
cache size: 5
max iterations: 2000
loss per missed segment: 3
C: 20 loss: 3 0.669591
C: 35 loss: 3 0.690058
C: 20 loss: 4.5 0.701754
C: 5 loss: 3 0.616959
C: 20 loss: 1.5 0.634503
C: 28.3003 loss: 5.74942 0.71345
C: 25.9529 loss: 5.72171 0.707602
C: 27.7407 loss: 5.97907 0.707602
C: 30.2561 loss: 5.61669 0.701754
C: 27.747 loss: 5.66612 0.710526
C: 28.9754 loss: 5.82319 0.707602
best C: 28.3003
best loss: 5.74942
num feats in chunker model: 4095
train: precision, recall, f1-score: 0.805851 0.885965 0.844011
Part I: elapsed time: 180 seconds.
Part II: train segment classifier
now do training
num training samples: 415
我理解这可能是由冗余标签引起的问题(如here所述);但是,我的所有标签都是独一无二的。我的理解是,训练不应该花这么长时间或使用这么多记忆。我见过其他人发布过类似的问题但尚未提供解决方案。导致这种高内存使用和疯狂训练时间的原因是什么?它是如何修复的?