我正在使用Apache opennlp 1.8.0,我正在尝试使用POSTaggerTrainer进行培训

时间:2017-06-20 20:16:05

标签: apache opennlp

在参考1.8.0版本的文档后,我尝试了doc中给出的CLI命令,它似乎不起作用,也没有在API下给出的Java代码。我有一个文本文件,其中包含以下文字:

  

列车me.txt

     

Last_JJ September_NNP,_,I_PRP tries_VBD to_TO find_VB out_RP the_DT   address_NN of_IN an_DT old_JJ school_NN friend_NN whom_WP I_PRP   had_VBD not_RB seen_VBN for_IN 15_CD years_NNS ._。 I_PRP just_RB   know_VBD his_PRP $ name_NN,_,Alan_NNP McKennedy_NNP,_,and_CC I_PRP   'd_MD听到_VBD the_DT rumour_NN that_IN he_PRP'd_MD moved_VBD to_TO   Scotland_NNP,_,the_DT country_NN of_IN his_PRP $ ancestors_NNS ._。

dictionary.xml

<?xml version="1.0" encoding="UTF-8"?><dictionary>
  <entry tags="NNP">
    <token>Calysta</token>
  </entry>

</dictionary>

我想使用这两个中的任何一个(如果可能的话)训练程序将Calysta标记为Calysta_NNP

1 个答案:

答案 0 :(得分:0)

我知道这是一个迟到的答案,但如果它有帮助..

arjun@arjun-VPCEH26EN:~/apache-opennlp-1.8.0/bin$ ./opennlp POSTaggerTrainer -data train-me.txt -dict dictionary.xml -lang en -model en-pos-maxent-cust.bin
Indexing events using cutoff of 5

    Computing event counts...  done. 52 events
    Indexing...  done.
Sorting and merging events... done. Reduced 52 events to 37.
Done indexing.
Incorporating indexed data for training...  
done.
    Number of Event Tokens: 37
        Number of Outcomes: 20
      Number of Predicates: 13
...done.
Computing model parameters ...
Performing 100 iterations.
  1:  ... loglikelihood=-155.77807822480764 0.038461538461538464
  2:  ... loglikelihood=-130.9791219262959  0.5
  3:  ... loglikelihood=-115.82234962334346 0.5576923076923077
  4:  ... loglikelihood=-105.13170003394434 0.6730769230769231
  5:  ... loglikelihood=-96.9869322585347   0.6730769230769231
  6:  ... loglikelihood=-90.51694300405765  0.6923076923076923
  7:  ... loglikelihood=-85.23546058034727  0.6923076923076923
  8:  ... loglikelihood=-80.83562367302892  0.7307692307692307
  9:  ... loglikelihood=-77.1097811259408   0.7307692307692307
 10:  ... loglikelihood=-73.91120812658458  0.7307692307692307
 11:  ... loglikelihood=-71.13309894938885  0.75
 12:  ... loglikelihood=-68.69589846103266  0.75
 13:  ... loglikelihood=-66.53917914878002  0.75
 14:  ... loglikelihood=-64.61622830997396  0.75
 15:  ... loglikelihood=-62.890348665987055 0.75
 16:  ... loglikelihood=-61.332281582677155 0.75
 17:  ... loglikelihood=-59.91838269276684  0.75
 18:  ... loglikelihood=-58.629310291693805 0.75
 19:  ... loglikelihood=-57.44906823464401  0.75
 20:  ... loglikelihood=-56.36429724151985  0.75
 21:  ... loglikelihood=-55.36374258766163  0.75
 22:  ... loglikelihood=-54.43784870333842  0.75
 23:  ... loglikelihood=-53.57844629573773  0.75
 24:  ... loglikelihood=-52.77850781690259  0.75
 25:  ... loglikelihood=-52.03195408008879  0.75
 26:  ... loglikelihood=-51.333499646171695 0.75
 27:  ... loglikelihood=-50.67852796323892  0.75
 28:  ... loglikelihood=-50.062989611378285 0.75
 29:  ... loglikelihood=-49.48331869161687  0.75
 30:  ... loglikelihood=-48.93636361232364  0.75
 31:  ... loglikelihood=-48.419329410290345 0.75
 32:  ... loglikelihood=-47.92972939439551  0.75
 33:  ... loglikelihood=-47.465344384258486 0.75
 34:  ... loglikelihood=-47.02418818116749  0.75
 35:  ... loglikelihood=-46.604478186421446 0.75
 36:  ... loglikelihood=-46.20461029609541  0.75
 37:  ... loglikelihood=-45.82313736754338  0.75
 38:  ... loglikelihood=-45.458750683509976 0.75
 39:  ... loglikelihood=-45.11026394313063  0.75
 40:  ... loglikelihood=-44.77659939167084  0.75
 41:  ... loglikelihood=-44.45677576728319  0.75
 42:  ... loglikelihood=-44.14989779685863  0.75
 43:  ... loglikelihood=-43.855147016888836 0.75
 44:  ... loglikelihood=-43.571773731178716 0.75
 45:  ... loglikelihood=-43.299089946831224 0.75
 46:  ... loglikelihood=-43.03646315440174  0.75
 47:  ... loglikelihood=-42.78331083845189  0.75
 48:  ... loglikelihood=-42.53909562169248  0.75
 49:  ... loglikelihood=-42.30332096009808  0.7692307692307693
 50:  ... loglikelihood=-42.07552731829657  0.7692307692307693
 51:  ... loglikelihood=-41.85528876457919  0.7692307692307693
 52:  ... loglikelihood=-41.642209933359936 0.7692307692307693
 53:  ... loglikelihood=-41.43592331010347  0.7692307692307693
 54:  ... loglikelihood=-41.236086799846426 0.7692307692307693
 55:  ... loglikelihood=-41.04238154563922  0.7692307692307693
 56:  ... loglikelihood=-40.854509967677004 0.7692307692307693
 57:  ... loglikelihood=-40.67219399768791  0.7692307692307693
 58:  ... loglikelihood=-40.49517348640929  0.7692307692307693
 59:  ... loglikelihood=-40.32320476478338  0.7692307692307693
 60:  ... loglikelihood=-40.1560593419208   0.7692307692307693
 61:  ... loglikelihood=-39.99352272496435  0.7692307692307693
 62:  ... loglikelihood=-39.835393347789605 0.7692307692307693
 63:  ... loglikelihood=-39.68148159704321  0.7692307692307693
 64:  ... loglikelihood=-39.53160892537774  0.7692307692307693
 65:  ... loglikelihood=-39.38560704292392  0.7692307692307693
 66:  ... loglikelihood=-39.243317179072264 0.7692307692307693
 67:  ... loglikelihood=-39.10458940753585  0.7692307692307693
 68:  ... loglikelihood=-38.969282028454    0.7692307692307693
 69:  ... loglikelihood=-38.8372610019872   0.7692307692307693
 70:  ... loglikelihood=-38.70839942845979  0.7692307692307693
 71:  ... loglikelihood=-38.58257707064014  0.7692307692307693
 72:  ... loglikelihood=-38.45967991421811  0.7692307692307693
 73:  ... loglikelihood=-38.33959976295419  0.7692307692307693
 74:  ... loglikelihood=-38.222233865340385 0.7692307692307693
 75:  ... loglikelihood=-38.107484569938585 0.7692307692307693
 76:  ... loglikelihood=-37.995259006848066 0.7692307692307693
 77:  ... loglikelihood=-37.88546879301048  0.7692307692307693
 78:  ... loglikelihood=-37.77802975928638  0.7692307692307693
 79:  ... loglikelihood=-37.6728616974405   0.7692307692307693
 80:  ... loglikelihood=-37.56988812535212  0.7692307692307693
 81:  ... loglikelihood=-37.469036068928645 0.7692307692307693
 82:  ... loglikelihood=-37.370235859343474 0.7692307692307693
 83:  ... loglikelihood=-37.27342094434868  0.7692307692307693
 84:  ... loglikelihood=-37.178527712527796 0.7692307692307693
 85:  ... loglikelihood=-37.08549532945806  0.7692307692307693
 86:  ... loglikelihood=-36.99426558484419  0.7692307692307693
 87:  ... loglikelihood=-36.904782749769446 0.7692307692307693
 88:  ... loglikelihood=-36.81699344328549  0.7692307692307693
 89:  ... loglikelihood=-36.730846507630154 0.7692307692307693
 90:  ... loglikelihood=-36.64629289142378  0.7692307692307693
 91:  ... loglikelihood=-36.563285540250355 0.7692307692307693
 92:  ... loglikelihood=-36.48177929407976  0.7692307692307693
 93:  ... loglikelihood=-36.40173079103272  0.7692307692307693
 94:  ... loglikelihood=-36.32309837703207  0.7692307692307693
 95:  ... loglikelihood=-36.24584202091997  0.7692307692307693
 96:  ... loglikelihood=-36.16992323465651  0.7692307692307693
 97:  ... loglikelihood=-36.095304998244124 0.7692307692307693
 98:  ... loglikelihood=-36.021951689052344 0.7692307692307693
 99:  ... loglikelihood=-35.94982901524132  0.7692307692307693
100:  ... loglikelihood=-35.87890395300729  0.7692307692307693
Writing pos tagger model ... done (0.086s)

Wrote pos tagger model to
path: /home/arjun/apache-opennlp-1.8.0/bin/en-pos-maxent-cust.bin

Execution time: 0.522 seconds

我使用的是Apache OpenNLP 1.8.0。如果您需要Apache OpenNLP POS Tagger的帮助,请恢复。