解析培训文件以进行文档分类时出现OpenNLP错误

时间:2016-12-08 08:13:04

标签: opennlp

我正在尝试使用opennlp处理文档分类器。但是我在培训文件上遇到了困难。当opennlp正在读取文件时,我收到以下错误:

Indexing events using cutoff of 5

    Computing event counts...  done. 17 events
    Indexing...  Dropped event greetings:[bow=hello]
Dropped event greetings:[bow=hi]
Dropped event greetings:[bow=salam]
Dropped event internet_problem:[bow=internet]
Dropped event internet_problem:[bow=no, bow=data]
Dropped event internet_problem:[bow=data, bow=not, bow=working]
Dropped event internet_problem:[bow=not, bow=able, bow=to, bow=open, bow=website]
Dropped event internet_problem:[bow=browsing, bow=issue]
Dropped event balance_problem:[bow=balance]
Dropped event balance_problem:[bow=usage]
Dropped event balance_problem:[bow=bill, bow=amount]
Dropped event balance_problem:[bow=billed]
Dropped event voice_problem:[bow=signals]
Dropped event voice_problem:[bow=call]
Dropped event voice_problem:[bow=voice]
Dropped event voice_problem:[bow=call, bow=drop]
Dropped event voice_problem:[bow=not, bow=connecting]
done.
Sorting and merging events... Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
    at java.util.ArrayList.rangeCheck(ArrayList.java:653)
    at java.util.ArrayList.get(ArrayList.java:429)
    at opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)
    at opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)
    at opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
    at opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
    at opennlp.tools.ml.model.TrainUtil.train(TrainUtil.java:53)
    at opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerME.java:204)
    at com.nlp.CategoryTrainUtil.trainModel(CategoryTrainUtil.java:39)
    at com.nlp.Boot.main(Boot.java:12)

我的培训文件如下:

greetings hello
greetings hi
greetings salam
internet_problem internet
internet_problem no data
internet_problem data not working
internet_problem not able to open website
internet_problem browsing issue
balance_problem balance
balance_problem usage
balance_problem bill amount
balance_problem billed
voice_problem signals
voice_problem call
voice_problem voice
voice_problem call drop
voice_problem not connecting

我没有得到我可能错过的东西。

1 个答案:

答案 0 :(得分:1)

第一行显示var objectoriginalX:Number; var objectoriginalY:Number; Atom.buttonMode = true; Atom.addEventListener(MouseEvent.MOUSE_DOWN, pickupObject); Atom.addEventListener(MouseEvent.MOUSE_UP, dropObject); Matter.buttonMode = true; Matter.addEventListener(MouseEvent.MOUSE_DOWN, pickupObject); Matter.addEventListener(MouseEvent.MOUSE_UP, dropObject); function pickupObject(event:MouseEvent):void { event.target.startDrag(); event.target.parent.addChild(event.target); } var dropCount:int = 0; var dbCount:int=0; var dbutton0; function dropObject (event:MouseEvent):void { event.target.stopDrag(); var targetName = Answer1; trace (targetName); var matchingTarget:DisplayObject = getChildByName(targetName); if (event.target.dropTarget != null && event.target.dropTarget.parent == answer[1].dest) // red flag here! //dest is a String and .parent is a MovieClip. //They can never be equal. This block of code would never run like this. { event.target.removeEventListener(MouseEvent.MOUSE_DOWN, pickupObject); event.target.removeEventListener(MouseEvent.MOUSE_UP, dropObject); event.target.buttonMode = false; alpha = .8 dropCount ++; event.target.x = 10 event.target.y = (Number(dropCount) * 100); trace ("hit"); trace (dropCount); } else { event.target.x = 100; event.target.y = 111; trace ("miss"); } } var answer:Array = [ {_name:"gravel",dest:"1"}, {_name:"Nuts and bolts",dest:"Answer1"}, {_name:"water",dest:"2"}, {_name:"gold",dest:"3"}, {_name:"Iron",dest:"4"} ]; text3.text = answer[1]._name; var myText:TextField = new TextField(); Atom.addChild(myText); myText.text = answer[1]._name;

这可能意味着您必须至少给出5个分类示例。所以,在训练数据中没有5的地方为Indexing events using cutoff of 5和其他人提供另外两个例子。

或者,如果您没有足够的训练数据,可以将截止值减少到3,但这不会给您带来好的结果。

希望这有帮助!