GATE机器学习不起作用

时间:2016-04-26 10:45:39

标签: java machine-learning text-classification gate

我想使用批量学习PR在GATE中进行文本分类。我首先编写这个配置XML,它可以工作。



<?xml version="1.0"?>
<ML-CONFIG>
  <VERBOSITY level="1"/>
  <SURROUND value="false"/>
  <PARAMETER name="thresholdProbabilityClassification" 
	     value="0.5"/>
  <multiClassification2Binary method="one-vs-others"/>
  <EVALUATION method="kfold" 
	      runs="5"
	      ratio="0.66" />
  <ENGINE nickname="PAUM" 
	  implementationName="PAUM"
	  options=" -p 50 -n 5 -optB 0.0  "/>
  <DATASET>
    <INSTANCE-TYPE>emotion</INSTANCE-TYPE>
    
    <NGRAM>
      <NAME>ngram</NAME>
      <NUMBER>1</NUMBER>
      <CONSNUM>4</CONSNUM>
      
      <CONS-1>
        <TYPE>Token</TYPE>
        <FEATURE>string</FEATURE>
      </CONS-1>
	  
	  <CONS-2>
        <TYPE>word_bag</TYPE>
        <FEATURE>feature</FEATURE>
      </CONS-2>
	  
	  <CONS-3>
        <TYPE>hashtag</TYPE>
        <FEATURE>feature</FEATURE>
      </CONS-3>
	  
	   <CONS-4>
        <TYPE>Token</TYPE>
        <FEATURE>category</FEATURE>
      </CONS-4>
   <WEIGHT>2</WEIGHT>
    </NGRAM>
    
    <ATTRIBUTE>
      <NAME>Class</NAME>
      <SEMTYPE>NOMINAL</SEMTYPE>
      <TYPE>emotion</TYPE>
      <FEATURE>feature</FEATURE>
      <POSITION>0</POSITION>
      <CLASS/>
    </ATTRIBUTE>
    
  </DATASET>
</ML-CONFIG>
&#13;
&#13;
&#13;

然而,当我改变CONS的顺序时,如下所示,它不起作用。

&#13;
&#13;
<?xml version="1.0"?>
<ML-CONFIG>
  <VERBOSITY level="1"/>
  <SURROUND value="false"/>
  <PARAMETER name="thresholdProbabilityClassification" 
	     value="0.5"/>
  <multiClassification2Binary method="one-vs-others"/>
  <EVALUATION method="kfold" 
	      runs="5"
	      ratio="0.66" />
  <ENGINE nickname="PAUM" 
	  implementationName="PAUM"
	  options=" -p 50 -n 5 -optB 0.0  "/>
  <DATASET>
    <INSTANCE-TYPE>emotion</INSTANCE-TYPE>
    
    <NGRAM>
      <NAME>ngram</NAME>
      <NUMBER>1</NUMBER>
      <CONSNUM>4</CONSNUM>
     	  
	  <CONS-1>
        <TYPE>word_bag</TYPE>
        <FEATURE>feature</FEATURE>
      </CONS-1>
	  
	  <CONS-2>
        <TYPE>hashtag</TYPE>
        <FEATURE>feature</FEATURE>
      </CONS-2>
	  
	   <CONS-3>
        <TYPE>Token</TYPE>
        <FEATURE>category</FEATURE>
      </CONS-3>
	 
	  <CONS-4>
        <TYPE>Token</TYPE>
        <FEATURE>string</FEATURE>
      </CONS-4>


	  
   <WEIGHT>2</WEIGHT>
    </NGRAM>
    
    <ATTRIBUTE>
      <NAME>Class</NAME>
      <SEMTYPE>NOMINAL</SEMTYPE>
      <TYPE>emotion</TYPE>
      <FEATURE>feature</FEATURE>
      <POSITION>0</POSITION>
      <CLASS/>
    </ATTRIBUTE>
    
  </DATASET>
</ML-CONFIG>
&#13;
&#13;
&#13;

但是,最后一个可以加载到GATE中,每次运行批量学习PR时,都会出现以下错误信息:

  

显示java.lang.NullPointerException       at gate.learning.NLPFeaturesOfDoc.writeNLPFeaturesToFile(NLPFeaturesOfDoc.java:818)       at gate.learning.LightWeightLearningApi.annotations2NLPFeatures(LightWeightLearningApi.java:198)       at gate.learning.EvaluationBasedOnDocs.oneRun(EvaluationBasedOnDocs.java:388)       at gate.learning.EvaluationBasedOnDocs.kfoldEval(EvaluationBasedOnDocs.java:197)       at gate.learning.EvaluationBasedOnDocs.evaluation(EvaluationBasedOnDocs.java:118)       在gate.learning.LearningAPIMain.execute(LearningAPIMain.java:776)       at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)       at gate.creole.ConditionalSerialController.runComponent(ConditionalSerialController.java:163)       at gate.creole.SerialController.executeImpl(SerialController.java:157)       at gate.creole.ConditionalSerialAnalyserController.executeImpl(ConditionalSerialAnalyserController.java:225)       at gate.creole.ConditionalSerialAnalyserController.execute(ConditionalSerialAnalyserController.java:132)       at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)       at gate.gui.SerialControllerEditor $ RunAction $ 1.run(SerialControllerEditor.java:1728)       在java.lang.Thread.run(未知来源)

有没有人对这个问题有任何想法?

非常感谢!

1 个答案:

答案 0 :(得分:0)

我建议您确保导致此问题的文档确实产生了配置XML文件中定义的功能。因为我看到你使用了Token,我认为该文档是空的。