我训练了一个自定义的NER模型,以从法律引文中检测出法院的缩写。我给了2000个样本,如下面所附的数据集中所示,但是当我在样本文本上运行模型时,实际准确性太低。
我得到的输出是:
1.III. (wrong)
2.D.Me. (right)
3.702 (wrong)
4.[ed]
我训练有什么问题吗? 该模型是否无法注释除数据集patterN之外的其余文本? 数据集如下所示:
('Dolby v. Dole Food Co. 896 F. Supp. 2d 556, 569 (D. Me. 2012)', {'entities': [(51,57, 'Court Abbr')]}),
('Commonwelth v. Zook, 803 F.3d 694, 695 (D. Me. 2015)', {'entities': [(41,47, 'Court Abbr')]}),
示例文字:
"Harley-Davidson and Goodyear filed motions to exclude Woehrle’s and Lee’s opinions, arguing they lacked the relia- bility required by Federal Rule of Evidence (III. 702) and Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (D. Me. 1993). Follow- ing a hearing, the district court agreed. The court concluded
that Woehrle’s opinion that manufacturing defects caused the tire to unseat from the rim upon being punctured “appear[ed] to be based on nothing more than his subjective belief and un- supported speculation"