Question

我想在文件上运行一个内置分类器，然后运行我自己的分类器，合并结果。

我如何使用Stanford NER，特别是通过命令行？

我知道How do I include more than one classifiers when using Stanford named entity recogniser?，但这有点不同，因为问题会询问有多个 NERServer 的分类器。

看起来我需要使用CoreNLP按顺序运行多个NER模型......我可以不使用CoreNLP吗？

说我有一个内容文件“快速的棕色狐狸跳过美国的懒狗”。我运行其中一个内置分类器，它找到“America”作为一个位置，然后我自己运行，它找到“狐狸”和“狗”，结果应该是：

the quick brown <animal>fox</animal> jumped over the lazy <animal>dog</animal> in <location>America</location

Answer 1

所以，如果您在命令行的单个命令中执行此操作，则可以开始使用的地方：

cat corpus.txt | tee `stanfordNER -options here > out1.xml` | myNERTagger -options here > out2.xml && diff out1.xml out2.xml | awk to do whatever merging you want here...

但你可能会发现这不是一个解决方案。你会想要在一个小脚本中逐句逐句，调用pyner或者类似的东西来挂钩斯坦福标记器，然后是你建立的任何自定义标记器，并在你进行时合并差异。标记器的输出格式将改变这看起来非常显着的方式。

如何使用Stanford NER运行多个分类器？

1 个答案: