Question

我希望了解如何使用斯坦福NER训练阿拉伯语语料库。我希望在这里使用免费提供的语料库，例如ANERCorp：

http://www1.ccls.columbia.edu/~ybenajiba/downloads.html

我使用了以下道具文件：

trainFile = ANERCorp
serializeTo = aner-model.ser.gz
map = word=0,answer=1
maxLeft=1
useClassFeature=true
useWord=true
useNGrams=true
noMidNGrams=true
maxNGramLeng=6
usePrev=true
useNext=true
useDisjunctive=true
useSequences=true
usePrevSequences=true
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC

然后我使用以下方法训练模型：

java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop t.prop

培训成功运行并序列化为我的模型。但是当我测试模型时，我总是得到一个空白数据集，即没有识别出任何实体。我知道我正在以正确的方式测试它，因为我能够成功地按照相同的方法测试英语模型上的英文文本。

我需要在prop文件中专门为阿拉伯语设置任何属性吗？之前有没有人试过在斯坦福NLP训练阿拉伯语？我知道这是使用LingPipe完成的，但我更愿意坚持使用SNLP。

使用斯坦福NER命名的阿拉伯语语料库实体识别

0 个答案: