Question

假设我们有这样的字符串：

4 pallets of books with a weight of 437 kg. The pallets measure 80 x 120 x 120 cm each and are protected with red shrinkwrap.

使用OpenNLP提取此类信息（尤其是颜色，重量和大小）的最佳方法是什么...考虑一些自定义语料库和自己的培训......但我不知道哪种方法最好开始。

<pallet amount>4</pallet amount> pallets of <product>books</product> with a weight of <weight>437</weight> <weightUnit>kg</weightUnit>. The pallets measure <height>80</height> x <width> 120 </width> x <length>120 </length> <measurementUnit>cm</measurementUnit> each and are protected with <color>red</color> shrinkwrap.

Answer 1

您只列出了一种方法（使用OpenNLP进行自定义培训），所以我不知道您认为其他选择是什么。这种方法几乎肯定是你最好的方法，除非你要搜索的短语是（a）常规和（b）与其他短语不同，在这种情况下你可以使用正则表达式。

有各种各样的套餐可以让你训练和标记：OpenNLP是一个，斯坦福NE是另一个。他们使用不同的培训方法，这将影响您的结果。但是，一旦获得了训练数据，就可以尝试使用不同的引擎，看看它是如何做的。

使用OpenNLP查找颜色，单位和大小的好策略是什么

1 个答案: