斯坦福NER定制型号无法正常工作

时间:2016-03-02 18:10:08

标签: java nlp stanford-nlp named-entity-recognition

我正在开展一个方面级别的情绪分析项目。 我现在处于方面术语提取模块的实现阶段,并使用Stanford NER来训练我自己的自定义模型,使用带有1000个TripAdvisor旅游评论的带注释数据集。

我设法培养了一个定制的NER。其代码如下;

import java.util.Properties;

import edu.stanford.nlp.ie.crf.CRFClassifier;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.sequences.SeqClassifierFlags;
import edu.stanford.nlp.util.StringUtils;

public class NERTrainer {

    public static void main(String[] args) {
        // TODO Auto-generated method stub
        String prop = "c:\\Users\\User\\Downloads\\properties.prop";
        Properties props = StringUtils.propFileToProperties(prop);
        String to = props.getProperty("serializeTo");
        props.setProperty("serializeTo", "c:\\Users\\User\\Desktop\\ner-travel-planner-model.ser.gz");
        SeqClassifierFlags flags = new SeqClassifierFlags(props);
        CRFClassifier<CoreLabel> crf = new CRFClassifier<CoreLabel>(flags);
        crf.train();
        crf.serializeClassifier("c:\\Users\\User\\Desktop\\ner-travel-planner-model.ser.gz");
    }

我的属性文件:(使用斯坦福大学网站上提供的默认文件)

 trainFile = IOB.tsv
#serializeTo = ner-model.ser.gz
map = word=0,answer=1

useClassFeature=true
useWord=true
useNGrams=true
noMidNGrams=true
maxNGramLeng=6
usePrev=true
useNext=true
useSequences=true
usePrevSequences=true
maxLeft=1
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC
useDisjunctive=true

日志显示它已成功训练。

usePrevSequences=true
useClassFeature=true
useTypeSeqs2=true
useSequences=true
wordShape=chris2useLC
useTypeySequences=true
useDisjunctive=true
noMidNGrams=true
serializeTo=c:\Users\User\Desktop\ner-travel-planner-model.ser.gz
maxNGramLeng=6
useNGrams=true
usePrev=true
useNext=true
maxLeft=1
trainFile=IOB.tsv
map=word=0,answer=1
useWord=true
useTypeSeqs=true
numFeatures = 114317
Time to convert docs to feature indices: 2.0 seconds
numClasses: 3 [0=O,1=I-TERM,2=B-TERM]
numDocuments: 2
numDatums: 56513
numFeatures: 114317
Time to convert docs to data/labels: 1.1 seconds
numWeights: 596487
QNMinimizer called on double function of 596487 variables, using M = 25.
               An explanation of the output:
Iter           The number of iterations
evals          The number of function evaluations
SCALING        <D> Diagonal scaling was used; <I> Scaled Identity
LINESEARCH     [## M steplength]  Minpack linesearch
                   1-Function value was too high
                   2-Value ok, gradient positive, positive curvature
                   3-Value ok, gradient negative, positive curvature
                   4-Value ok, gradient negative, negative curvature
               [.. B]  Backtracking
VALUE          The current function value
TIME           Total elapsed time
|GNORM|        The current norm of the gradient
{RELNORM}      The ratio of the current to initial gradient norms
AVEIMPROVE     The average improvement / current value
EVALSCORE      The last available eval score

Iter ## evals ## <SCALING> [LINESEARCH] VALUE TIME |GNORM| {RELNORM} AVEIMPROVE EVALSCORE

Iter 1 evals 1 <D> [11M 8.212E-5] 1.714E5 1.06s |1.080E4| {1.082E-1} 0.000E0 - 
Iter 2 evals 4 <D> [33131M 6.201E0] 1.204E5 2.78s |8.770E3| {8.784E-2} 2.120E-1 - 
Iter 3 evals 10 <D> [1M 2.210E-2] 1.158E5 3.36s |4.819E3| {4.826E-2} 1.603E-1 - 
.
.
.
Iter 175 evals 207 <D> [M 1.000E0] 2.132E3 74.42s
QNMinimizer terminated due to average improvement: | newest_val - previous_val | / |newestVal| < TOL 
Total time spent in optimization: 74.43s

可以找到分类器文件here

训练数据采用IOB表示法;

B-TERM - begining of aspect term label
I-TERM - continuation of aspect term label
O - Default 'not a keyword' label

示例培训数据;

so  O
peaceful    B-TERM
interesting B-TERM
and I-TERM
informative I-TERM
it  O
had O
been    O
raining B-TERM
so  O
we  O
had O
it's    O
still   O
a   O
place   B-TERM
of  I-TERM
worship I-TERM
after   O
that    O
just    O
walk    B-TERM
down    O
to  O
jungle  O
beach   O
and O
grab    O
yourself    O
a   O
cold    B-TERM
beer    I-TERM
or  O
two O
and O
a   O
cool    O
off O
in  O
the O
surf    B-TERM

但是当我尝试测试时,它似乎没有用。所有令牌都只用O标记。

import edu.stanford.nlp.ie.NERClassifierCombiner;
import edu.stanford.nlp.ie.AbstractSequenceClassifier;
import edu.stanford.nlp.ie.crf.*;
import edu.stanford.nlp.io.IOUtils;
import edu.stanford.nlp.ling.CoreLabel;
import java.io.IOException;
import java.util.List;

public class NERDemo {

  public static void main(String[] args) throws IOException {
    String serializedClassifier = "c:\\Users\\User\\Desktop\\ner-travel-planner-model.ser.gz";
   // String serializedClassifier2 = "/local/stanford-ner-2015-01-30/classifiers/english.muc.7class.distsim.crf.ser.gz";

    if (args.length > 0) {
      serializedClassifier = args[0];
    }

    NERClassifierCombiner classifier = new NERClassifierCombiner(false, false, 
            serializedClassifier);

    String fileContents = IOUtils.slurpFile("c:\\Users\\User\\Desktop\\test-ner.txt");
    List<List<CoreLabel>> out = classifier.classify(fileContents);

    int i = 0;
    for (List<CoreLabel> lcl : out) {
      i++;
      int j = 0;
      for (CoreLabel cl : lcl) {
        j++;
        System.out.printf("%d:%d: %s%n", i, j,
                cl.toShorterString("Text", "CharacterOffsetBegin", "CharacterOffsetEnd", "NamedEntityTag"));
      }
    }
  }

输出:

Loading classifier from c:\Users\User\Desktop\ner-travel-planner-model.ser.gz ... done [0.4 sec].
1:1: [Text=If CharacterOffsetBegin=0 CharacterOffsetEnd=2 NamedEntityTag=O]
1:2: [Text=you CharacterOffsetBegin=3 CharacterOffsetEnd=6 NamedEntityTag=O]
1:3: [Text=happen CharacterOffsetBegin=7 CharacterOffsetEnd=13 NamedEntityTag=O]
1:4: [Text=to CharacterOffsetBegin=14 CharacterOffsetEnd=16 NamedEntityTag=O]
1:5: [Text=visit CharacterOffsetBegin=17 CharacterOffsetEnd=22 NamedEntityTag=O]
1:6: [Text=Kandy CharacterOffsetBegin=23 CharacterOffsetEnd=28 NamedEntityTag=O]
1:7: [Text=the CharacterOffsetBegin=30 CharacterOffsetEnd=33 NamedEntityTag=O]
1:8: [Text=Tea CharacterOffsetBegin=34 CharacterOffsetEnd=37 NamedEntityTag=O]
1:9: [Text=Museum CharacterOffsetBegin=38 CharacterOffsetEnd=44 NamedEntityTag=O]
1:10: [Text=is CharacterOffsetBegin=45 CharacterOffsetEnd=47 NamedEntityTag=O]
1:11: [Text=a CharacterOffsetBegin=48 CharacterOffsetEnd=49 NamedEntityTag=O]
1:12: [Text=must CharacterOffsetBegin=50 CharacterOffsetEnd=54 NamedEntityTag=O]
1:13: [Text=visit CharacterOffsetBegin=55 CharacterOffsetEnd=60 NamedEntityTag=O]
1:14: [Text=place CharacterOffsetBegin=61 CharacterOffsetEnd=66 NamedEntityTag=O]
1:15: [Text=it CharacterOffsetBegin=68 CharacterOffsetEnd=70 NamedEntityTag=O]
1:16: [Text=is CharacterOffsetBegin=71 CharacterOffsetEnd=73 NamedEntityTag=O]
1:17: [Text=located CharacterOffsetBegin=74 CharacterOffsetEnd=81 NamedEntityTag=O]
1:18: [Text=in CharacterOffsetBegin=82 CharacterOffsetEnd=84 NamedEntityTag=O]
1:19: [Text=a CharacterOffsetBegin=85 CharacterOffsetEnd=86 NamedEntityTag=O]
1:20: [Text=lovely CharacterOffsetBegin=87 CharacterOffsetEnd=93 NamedEntityTag=O]
1:21: [Text=place CharacterOffsetBegin=94 CharacterOffsetEnd=99 NamedEntityTag=O]
1:22: [Text=with CharacterOffsetBegin=100 CharacterOffsetEnd=104 NamedEntityTag=O]
1:23: [Text=a CharacterOffsetBegin=105 CharacterOffsetEnd=106 NamedEntityTag=O]
1:24: [Text=breathtaking CharacterOffsetBegin=107 CharacterOffsetEnd=119 NamedEntityTag=O]
1:25: [Text=view CharacterOffsetBegin=120 CharacterOffsetEnd=124 NamedEntityTag=O]
1:26: [Text=. CharacterOffsetBegin=124 CharacterOffsetEnd=125 NamedEntityTag=O]
2:1: [Text=This CharacterOffsetBegin=126 CharacterOffsetEnd=130 NamedEntityTag=O]
2:2: [Text=place CharacterOffsetBegin=131 CharacterOffsetEnd=136 NamedEntityTag=O]
2:3: [Text=will CharacterOffsetBegin=137 CharacterOffsetEnd=141 NamedEntityTag=O]
2:4: [Text=tell CharacterOffsetBegin=142 CharacterOffsetEnd=146 NamedEntityTag=O]
2:5: [Text=you CharacterOffsetBegin=147 CharacterOffsetEnd=150 NamedEntityTag=O]
2:6: [Text=everything CharacterOffsetBegin=151 CharacterOffsetEnd=161 NamedEntityTag=O]
2:7: [Text=you CharacterOffsetBegin=162 CharacterOffsetEnd=165 NamedEntityTag=O]
2:8: [Text=should CharacterOffsetBegin=166 CharacterOffsetEnd=172 NamedEntityTag=O]
2:9: [Text=know CharacterOffsetBegin=173 CharacterOffsetEnd=177 NamedEntityTag=O]
2:10: [Text=about CharacterOffsetBegin=178 CharacterOffsetEnd=183 NamedEntityTag=O]
2:11: [Text=the CharacterOffsetBegin=184 CharacterOffsetEnd=187 NamedEntityTag=O]
2:12: [Text=history CharacterOffsetBegin=188 CharacterOffsetEnd=195 NamedEntityTag=O]
2:13: [Text=of CharacterOffsetBegin=196 CharacterOffsetEnd=198 NamedEntityTag=O]
2:14: [Text=Tea CharacterOffsetBegin=199 CharacterOffsetEnd=202 NamedEntityTag=O]
2:15: [Text=in CharacterOffsetBegin=203 CharacterOffsetEnd=205 NamedEntityTag=O]
2:16: [Text=Sri CharacterOffsetBegin=206 CharacterOffsetEnd=209 NamedEntityTag=O]
2:17: [Text=Lanka CharacterOffsetBegin=210 CharacterOffsetEnd=215 NamedEntityTag=O]
2:18: [Text=. CharacterOffsetBegin=215 CharacterOffsetEnd=216 NamedEntityTag=O]
3:1: [Text=There CharacterOffsetBegin=217 CharacterOffsetEnd=222 NamedEntityTag=O]
3:2: [Text=are CharacterOffsetBegin=223 CharacterOffsetEnd=226 NamedEntityTag=O]
3:3: [Text=guides CharacterOffsetBegin=227 CharacterOffsetEnd=233 NamedEntityTag=O]
3:4: [Text=in CharacterOffsetBegin=234 CharacterOffsetEnd=236 NamedEntityTag=O]
3:5: [Text=the CharacterOffsetBegin=237 CharacterOffsetEnd=240 NamedEntityTag=O]
3:6: [Text=building CharacterOffsetBegin=241 CharacterOffsetEnd=249 NamedEntityTag=O]
3:7: [Text=who CharacterOffsetBegin=250 CharacterOffsetEnd=253 NamedEntityTag=O]
3:8: [Text=will CharacterOffsetBegin=254 CharacterOffsetEnd=258 NamedEntityTag=O]
3:9: [Text=take CharacterOffsetBegin=259 CharacterOffsetEnd=263 NamedEntityTag=O]
3:10: [Text=you CharacterOffsetBegin=264 CharacterOffsetEnd=267 NamedEntityTag=O]
3:11: [Text=around CharacterOffsetBegin=268 CharacterOffsetEnd=274 NamedEntityTag=O]
3:12: [Text=explaining CharacterOffsetBegin=275 CharacterOffsetEnd=285 NamedEntityTag=O]
3:13: [Text=what CharacterOffsetBegin=286 CharacterOffsetEnd=290 NamedEntityTag=O]
3:14: [Text=they CharacterOffsetBegin=291 CharacterOffsetEnd=295 NamedEntityTag=O]
3:15: [Text=have CharacterOffsetBegin=296 CharacterOffsetEnd=300 NamedEntityTag=O]
3:16: [Text=in CharacterOffsetBegin=301 CharacterOffsetEnd=303 NamedEntityTag=O]
3:17: [Text=each CharacterOffsetBegin=304 CharacterOffsetEnd=308 NamedEntityTag=O]
3:18: [Text=floor CharacterOffsetBegin=309 CharacterOffsetEnd=314 NamedEntityTag=O]
3:19: [Text=. CharacterOffsetBegin=314 CharacterOffsetEnd=315 NamedEntityTag=O]
4:1: [Text=You CharacterOffsetBegin=316 CharacterOffsetEnd=319 NamedEntityTag=O]
4:2: [Text=could CharacterOffsetBegin=320 CharacterOffsetEnd=325 NamedEntityTag=O]
4:3: [Text=enjoy CharacterOffsetBegin=326 CharacterOffsetEnd=331 NamedEntityTag=O]
4:4: [Text=a CharacterOffsetBegin=332 CharacterOffsetEnd=333 NamedEntityTag=O]
4:5: [Text=cup CharacterOffsetBegin=334 CharacterOffsetEnd=337 NamedEntityTag=O]
4:6: [Text=of CharacterOffsetBegin=338 CharacterOffsetEnd=340 NamedEntityTag=O]
4:7: [Text=good CharacterOffsetBegin=341 CharacterOffsetEnd=345 NamedEntityTag=O]
4:8: [Text=tea CharacterOffsetBegin=346 CharacterOffsetEnd=349 NamedEntityTag=O]
4:9: [Text=in CharacterOffsetBegin=350 CharacterOffsetEnd=352 NamedEntityTag=O]
4:10: [Text=the CharacterOffsetBegin=353 CharacterOffsetEnd=356 NamedEntityTag=O]
4:11: [Text=restaurant CharacterOffsetBegin=357 CharacterOffsetEnd=367 NamedEntityTag=O]
4:12: [Text=upstairs CharacterOffsetBegin=368 CharacterOffsetEnd=376 NamedEntityTag=O]
4:13: [Text=but CharacterOffsetBegin=378 CharacterOffsetEnd=381 NamedEntityTag=O]
4:14: [Text=they CharacterOffsetBegin=382 CharacterOffsetEnd=386 NamedEntityTag=O]
4:15: [Text=cant CharacterOffsetBegin=387 CharacterOffsetEnd=391 NamedEntityTag=O]
4:16: [Text=make CharacterOffsetBegin=392 CharacterOffsetEnd=396 NamedEntityTag=O]
4:17: [Text=a CharacterOffsetBegin=397 CharacterOffsetEnd=398 NamedEntityTag=O]
4:18: [Text=proper CharacterOffsetBegin=399 CharacterOffsetEnd=405 NamedEntityTag=O]
4:19: [Text=tea CharacterOffsetBegin=406 CharacterOffsetEnd=409 NamedEntityTag=O]
4:20: [Text=even CharacterOffsetBegin=411 CharacterOffsetEnd=415 NamedEntityTag=O]
4:21: [Text=if CharacterOffsetBegin=416 CharacterOffsetEnd=418 NamedEntityTag=O]
4:22: [Text=it CharacterOffsetBegin=419 CharacterOffsetEnd=421 NamedEntityTag=O]
4:23: [Text=saves CharacterOffsetBegin=422 CharacterOffsetEnd=427 NamedEntityTag=O]
4:24: [Text=their CharacterOffsetBegin=428 CharacterOffsetEnd=433 NamedEntityTag=O]
4:25: [Text=life CharacterOffsetBegin=434 CharacterOffsetEnd=438 NamedEntityTag=O]
4:26: [Text=. CharacterOffsetBegin=438 CharacterOffsetEnd=439 NamedEntityTag=O]

我似乎无法弄清楚我做错了什么。请帮忙。

0 个答案:

没有答案