Question

我使用NLTK的RegexpParser来组织一个名词短语，我用语法定义为

 grammar = "NP: {<DT>?<JJ>*<NN|NNS>+}"
 cp = RegexpParser(grammar)

这是盛大的，它与名词短语匹配为：

DT（如果存在）
JJ无论数字
NN或NNS，至少一个

现在，如果我想匹配相同的但是将无论数字转换为只有一个，该怎么办？所以我想匹配DT，如果它存在，一个 JJ和1+ NN / NNS。如果有多个JJ，我想只匹配其中一个，最接近名词的那个（如果有，则为DT，和NN / NNS）。

语法

grammar = "NP: {<DT>?<JJ><NN|NNS>+}"

只有当只有一个JJ时，

才会匹配，语法

grammar = "NP: {<DT>?<JJ>{1}<NN|NNS>+}"

我认为在给定typical Regexp patterns时会起作用，会引发一个ValueError。

例如，在＆＃34;这条美丽的绿色裙子中，我想要大块＆＃34;这条绿色的裙子＆＃34;。

那么，我该怎么办？

Answer 1

Grammer grammar = "NP: {<DT>?<JJ><NN|NNS>+}"对于您提到的要求是正确的。

您在评论部分中提供的示例，其中您没有在输出中获得DT -

"This beautiful green skirt is for you."

Tree('S', [('This', 'DT'), ('beautiful', 'JJ'), Tree('NP', [('green','JJ'), 
('skirt', 'NN')]), ('is', 'VBZ'), ('for', 'IN'), ('you', 'PRP'), ('.', '.')])

在您的示例中，有2 consecutive JJs符合您的要求 - "I want to match DT if it exists, one JJ and 1+ NN/NNS."

更新要求 - I want to match DT if it exists, one JJ and 1+ NN/NNS. If there are more than one JJ, I want to match only one of them, the one nearest to the noun (and DT if there is, and NN/NNS).

在这里，您需要使用

grammar = "NP: {<DT>?<JJ>*<NN|NNS>+}"

并对NP块进行后处理以删除额外的JJ。

<强>代码：

from nltk import Tree

chunk_output = Tree('S', [Tree('NP', [('This', 'DT'), ('beautiful', 'JJ'), ('green','JJ'), ('skirt', 'NN')]), ('is', 'VBZ'), ('for', 'IN'), ('you', 'PRP'), ('.', '.')])

for child in chunk_output:
    if isinstance(child, Tree):               
        if child.label() == 'NP':
            for num in range(len(child)):
                if not (child[num][1]=='JJ' and child[num+1][1]=='JJ'):
                    print child[num][0]

<强>输出：

This
green
skirt

NLTK RegexpParser，通过恰好匹配一个项

1 个答案: