在参考1.8.0版本的文档后,我尝试了doc中给出的CLI命令,它似乎不起作用,也没有在API下给出的Java代码。我有一个文本文件,其中包含以下文字:
列车me.txt
Last_JJ September_NNP,_,I_PRP tries_VBD to_TO find_VB out_RP the_DT address_NN of_IN an_DT old_JJ school_NN friend_NN whom_WP I_PRP had_VBD not_RB seen_VBN for_IN 15_CD years_NNS ._。 I_PRP just_RB know_VBD his_PRP $ name_NN,_,Alan_NNP McKennedy_NNP,_,and_CC I_PRP 'd_MD听到_VBD the_DT rumour_NN that_IN he_PRP'd_MD moved_VBD to_TO Scotland_NNP,_,the_DT country_NN of_IN his_PRP $ ancestors_NNS ._。
dictionary.xml
<?xml version="1.0" encoding="UTF-8"?><dictionary>
<entry tags="NNP">
<token>Calysta</token>
</entry>
</dictionary>
我想使用这两个中的任何一个(如果可能的话)训练程序将Calysta标记为Calysta_NNP
答案 0 :(得分:0)
我知道这是一个迟到的答案,但如果它有帮助..
arjun@arjun-VPCEH26EN:~/apache-opennlp-1.8.0/bin$ ./opennlp POSTaggerTrainer -data train-me.txt -dict dictionary.xml -lang en -model en-pos-maxent-cust.bin
Indexing events using cutoff of 5
Computing event counts... done. 52 events
Indexing... done.
Sorting and merging events... done. Reduced 52 events to 37.
Done indexing.
Incorporating indexed data for training...
done.
Number of Event Tokens: 37
Number of Outcomes: 20
Number of Predicates: 13
...done.
Computing model parameters ...
Performing 100 iterations.
1: ... loglikelihood=-155.77807822480764 0.038461538461538464
2: ... loglikelihood=-130.9791219262959 0.5
3: ... loglikelihood=-115.82234962334346 0.5576923076923077
4: ... loglikelihood=-105.13170003394434 0.6730769230769231
5: ... loglikelihood=-96.9869322585347 0.6730769230769231
6: ... loglikelihood=-90.51694300405765 0.6923076923076923
7: ... loglikelihood=-85.23546058034727 0.6923076923076923
8: ... loglikelihood=-80.83562367302892 0.7307692307692307
9: ... loglikelihood=-77.1097811259408 0.7307692307692307
10: ... loglikelihood=-73.91120812658458 0.7307692307692307
11: ... loglikelihood=-71.13309894938885 0.75
12: ... loglikelihood=-68.69589846103266 0.75
13: ... loglikelihood=-66.53917914878002 0.75
14: ... loglikelihood=-64.61622830997396 0.75
15: ... loglikelihood=-62.890348665987055 0.75
16: ... loglikelihood=-61.332281582677155 0.75
17: ... loglikelihood=-59.91838269276684 0.75
18: ... loglikelihood=-58.629310291693805 0.75
19: ... loglikelihood=-57.44906823464401 0.75
20: ... loglikelihood=-56.36429724151985 0.75
21: ... loglikelihood=-55.36374258766163 0.75
22: ... loglikelihood=-54.43784870333842 0.75
23: ... loglikelihood=-53.57844629573773 0.75
24: ... loglikelihood=-52.77850781690259 0.75
25: ... loglikelihood=-52.03195408008879 0.75
26: ... loglikelihood=-51.333499646171695 0.75
27: ... loglikelihood=-50.67852796323892 0.75
28: ... loglikelihood=-50.062989611378285 0.75
29: ... loglikelihood=-49.48331869161687 0.75
30: ... loglikelihood=-48.93636361232364 0.75
31: ... loglikelihood=-48.419329410290345 0.75
32: ... loglikelihood=-47.92972939439551 0.75
33: ... loglikelihood=-47.465344384258486 0.75
34: ... loglikelihood=-47.02418818116749 0.75
35: ... loglikelihood=-46.604478186421446 0.75
36: ... loglikelihood=-46.20461029609541 0.75
37: ... loglikelihood=-45.82313736754338 0.75
38: ... loglikelihood=-45.458750683509976 0.75
39: ... loglikelihood=-45.11026394313063 0.75
40: ... loglikelihood=-44.77659939167084 0.75
41: ... loglikelihood=-44.45677576728319 0.75
42: ... loglikelihood=-44.14989779685863 0.75
43: ... loglikelihood=-43.855147016888836 0.75
44: ... loglikelihood=-43.571773731178716 0.75
45: ... loglikelihood=-43.299089946831224 0.75
46: ... loglikelihood=-43.03646315440174 0.75
47: ... loglikelihood=-42.78331083845189 0.75
48: ... loglikelihood=-42.53909562169248 0.75
49: ... loglikelihood=-42.30332096009808 0.7692307692307693
50: ... loglikelihood=-42.07552731829657 0.7692307692307693
51: ... loglikelihood=-41.85528876457919 0.7692307692307693
52: ... loglikelihood=-41.642209933359936 0.7692307692307693
53: ... loglikelihood=-41.43592331010347 0.7692307692307693
54: ... loglikelihood=-41.236086799846426 0.7692307692307693
55: ... loglikelihood=-41.04238154563922 0.7692307692307693
56: ... loglikelihood=-40.854509967677004 0.7692307692307693
57: ... loglikelihood=-40.67219399768791 0.7692307692307693
58: ... loglikelihood=-40.49517348640929 0.7692307692307693
59: ... loglikelihood=-40.32320476478338 0.7692307692307693
60: ... loglikelihood=-40.1560593419208 0.7692307692307693
61: ... loglikelihood=-39.99352272496435 0.7692307692307693
62: ... loglikelihood=-39.835393347789605 0.7692307692307693
63: ... loglikelihood=-39.68148159704321 0.7692307692307693
64: ... loglikelihood=-39.53160892537774 0.7692307692307693
65: ... loglikelihood=-39.38560704292392 0.7692307692307693
66: ... loglikelihood=-39.243317179072264 0.7692307692307693
67: ... loglikelihood=-39.10458940753585 0.7692307692307693
68: ... loglikelihood=-38.969282028454 0.7692307692307693
69: ... loglikelihood=-38.8372610019872 0.7692307692307693
70: ... loglikelihood=-38.70839942845979 0.7692307692307693
71: ... loglikelihood=-38.58257707064014 0.7692307692307693
72: ... loglikelihood=-38.45967991421811 0.7692307692307693
73: ... loglikelihood=-38.33959976295419 0.7692307692307693
74: ... loglikelihood=-38.222233865340385 0.7692307692307693
75: ... loglikelihood=-38.107484569938585 0.7692307692307693
76: ... loglikelihood=-37.995259006848066 0.7692307692307693
77: ... loglikelihood=-37.88546879301048 0.7692307692307693
78: ... loglikelihood=-37.77802975928638 0.7692307692307693
79: ... loglikelihood=-37.6728616974405 0.7692307692307693
80: ... loglikelihood=-37.56988812535212 0.7692307692307693
81: ... loglikelihood=-37.469036068928645 0.7692307692307693
82: ... loglikelihood=-37.370235859343474 0.7692307692307693
83: ... loglikelihood=-37.27342094434868 0.7692307692307693
84: ... loglikelihood=-37.178527712527796 0.7692307692307693
85: ... loglikelihood=-37.08549532945806 0.7692307692307693
86: ... loglikelihood=-36.99426558484419 0.7692307692307693
87: ... loglikelihood=-36.904782749769446 0.7692307692307693
88: ... loglikelihood=-36.81699344328549 0.7692307692307693
89: ... loglikelihood=-36.730846507630154 0.7692307692307693
90: ... loglikelihood=-36.64629289142378 0.7692307692307693
91: ... loglikelihood=-36.563285540250355 0.7692307692307693
92: ... loglikelihood=-36.48177929407976 0.7692307692307693
93: ... loglikelihood=-36.40173079103272 0.7692307692307693
94: ... loglikelihood=-36.32309837703207 0.7692307692307693
95: ... loglikelihood=-36.24584202091997 0.7692307692307693
96: ... loglikelihood=-36.16992323465651 0.7692307692307693
97: ... loglikelihood=-36.095304998244124 0.7692307692307693
98: ... loglikelihood=-36.021951689052344 0.7692307692307693
99: ... loglikelihood=-35.94982901524132 0.7692307692307693
100: ... loglikelihood=-35.87890395300729 0.7692307692307693
Writing pos tagger model ... done (0.086s)
Wrote pos tagger model to
path: /home/arjun/apache-opennlp-1.8.0/bin/en-pos-maxent-cust.bin
Execution time: 0.522 seconds
我使用的是Apache OpenNLP 1.8.0。如果您需要Apache OpenNLP POS Tagger的帮助,请恢复。