Question

关于形态类的stemStatic方法的注释表明它将：

返回一个新的WordTag，其中引理为word（）的值。
除非选项具有，否则默认为小写非专有名词已经确定。

（https://github.com/evandrix/stanford-corenlp/blob/master/src/edu/stanford/nlp/process/Morphology.java）

我如何/在哪里设置这些选项，以禁用小写转换？

我查看了源代码，但无法查看如何设置会影响此静态方法的选项。令人沮丧的是，相关的静态lemmatise方法 - lemmaStatic - 包含一个布尔参数来完成这个......

我通过Maven使用v3.3.1 ......

谢谢！

Answer 1

好看了一下之后，似乎正确的方法可能是不使用静态方法，而是用以下方法构建一个Morphology实例：

public Morphology(Reader in, int flags) {

int标志将设置lexer.options。

以下是词法分析器选项（来自Morpha.java）：

/** If this option is set, print the word affix after a + character */
private final static int print_affixes = 0;  
/** If this option is set, lowercase all tokens */
private final static int change_case = 1;
/** Return the tags on the input words if present?? */
private final static int tag_output= 2;

int标志是3个选项的位串，所以7 = 111，意味着所有选项都将设置为true，0 = 000，所有选项都为false，5 = 101将设置print_affixes和tag_output等...

然后你可以在Morphology.java中使用apply

public Object apply(Object in) {

对象应该是使用原始单词和标记构建的WordTag。

如果您需要任何进一步的帮助，请与我们联系！

我们也可以改变Morphology.java以获得你想要的那种方法！以上是如果你不想玩定制斯坦福CoreNLP。

Stanford CoreNLP Morphology.stemStatic禁用小写转换？

1 个答案: