简单的语法在Python中给出ValueError

时间:2014-10-22 10:42:10

标签: python-3.x nlp nltk

我是Python,nltk和nlp的新手。我写了简单的语法。但是在运行程序时,它会给出以下错误。请帮我解决这个错误

语法: -

S -> NP
NP -> PN|PRO|D[NUM=?n] N[NUM=?n]|D[NUM=?n] A N[NUM=?n]|D[NUM=?n] N[NUM=?n] PP|QP N[NUM=?n]|A N[NUM=?n]|D[NUM=?n] NOM PP|D[NUM=?n] NOM
PP -> P NP
D[NUM=sg] -> 'a'
D -> 'the'
N[NUM=sg] -> 'boy'|'girl'|'room'|'garden'|'hair'
N[NUM=pl] -> 'dogs'|'cats'
PN -> 'saumya'|'dinesh'
PRO -> 'she'|'he'|'we'
A -> 'tall'|'naughty'|'long'|'three'|'black'
P -> 'with'|'in'|'from'|'at'
QP -> 'some'
NOM -> A NOM|N[NUM=?n]

代码: -

import nltk

grammar = nltk.data.load('file:english_grammer.cfg')
rdparser = nltk.RecursiveDescentParser(grammar)
sent = "a dogs".split()
trees = rdparser.parse(sent)

for tree in trees: print (tree)

错误: -

ValueError:预期非终结符,发现:[NUM =?n] N [NUM =?n] | D [NUM =?n] AN [NUM =?n] | D [NUM =?n] N [NUM =?n] PP | QP N [NUM =?n] | AN [NUM =?n] | D [NUM =?n] NOM PP | D [NUM =?n] NOM

3 个答案:

答案 0 :(得分:5)

我不认为NLTK CFG语法读者可以用方括号阅读你的CFG格式。

首先让我们尝试不带方括号的CFG语法:

from nltk.grammar import CFG

grammar_string = '''
S -> NP
PP -> P NP
D -> 'the'
PN -> 'saumya'|'dinesh'
PRO -> 'she'|'he'|'we'
A -> 'tall'|'naughty'|'long'|'three'|'black'
P -> 'with'|'in'|'from'|'at'
QP -> 'some'
'''

grammar = CFG.fromstring(grammar_string)
print grammar

[OUT]:

Grammar with 18 productions (start state = S)
    S -> NP
    PP -> P NP
    D -> 'the'
    PN -> 'saumya'
    PN -> 'dinesh'
    PRO -> 'she'
    PRO -> 'he'
    PRO -> 'we'
    A -> 'tall'
    A -> 'naughty'
    A -> 'long'
    A -> 'three'
    A -> 'black'
    P -> 'with'
    P -> 'in'
    P -> 'from'
    P -> 'at'
    QP -> 'some'

现在让我们把方括号放在:

from nltk.grammar import CFG

grammar_string = '''
S -> NP
PP -> P NP
D -> 'the'
PN -> 'saumya'|'dinesh'
PRO -> 'she'|'he'|'we'
A -> 'tall'|'naughty'|'long'|'three'|'black'
P -> 'with'|'in'|'from'|'at'
QP -> 'some'
N[NUM=sg] -> 'boy'|'girl'|'room'|'garden'|'hair'
N[NUM=pl] -> 'dogs'|'cats'
'''

grammar = CFG.fromstring(grammar_string)
print grammar

[OUT]:

Traceback (most recent call last):
  File "test.py", line 33, in <module>
    grammar = CFG.fromstring(grammar_string)
  File "/usr/local/lib/python2.7/dist-packages/nltk/grammar.py", line 519, in fromstring
    encoding=encoding)
  File "/usr/local/lib/python2.7/dist-packages/nltk/grammar.py", line 1273, in read_grammar
    (linenum+1, line, e))
ValueError: Unable to parse line 10: N[NUM=sg] -> 'boy'|'girl'|'room'|'garden'|'hair'
Expected an arrow

回到你的语法,似乎你使用方括号来表示约束或不合约,所以解决方案是

  • 使用下划线作为对比非终端和
  • 为无约束的非终端制定规则

所以你的cfg规则看起来如此:

from nltk.parse import RecursiveDescentParser
from nltk.grammar import CFG

grammar_string = '''
S -> NP
NP -> PN | PRO | D N | D A N | D N PP | QP N | A N | D NOM PP | D NOM

PP -> P NP
PN -> 'saumya'|'dinesh'
PRO -> 'she'|'he'|'we'
A -> 'tall'|'naughty'|'long'|'three'|'black'
P -> 'with'|'in'|'from'|'at'
QP -> 'some'

D -> D_def | D_sg
D_def -> 'the'
D_sg -> 'a'

N -> N_sg | N_pl
N_sg -> 'boy'|'girl'|'room'|'garden'|'hair'
N_pl -> 'dogs'|'cats'
'''

grammar = CFG.fromstring(grammar_string)

rdparser = RecursiveDescentParser(grammar)
sent = "a dogs".split()
trees = rdparser.parse(sent)

for tree in trees:
    print (tree)

[OUT]:

(S (NP (D (D_sg a)) (N (N_pl dogs))))

答案 1 :(得分:1)

看起来您正在尝试使用NLTK的功能语法,它使用方括号语法来表示功能和功能协议。 NLTK使用特征语法的解析器是FeatureEarleyChartParser(与RecursiveDescentParser相反)。

来自NLTK documentation

>>> from __future__ import print_function
>>> import nltk
>>> from nltk import grammar, parse
>>> g = """
... % start DP
... DP[AGR=?a] -> D[AGR=?a] N[AGR=?a]
... D[AGR=[NUM='sg', PERS=3]] -> 'this' | 'that'
... D[AGR=[NUM='pl', PERS=3]] -> 'these' | 'those'
... D[AGR=[NUM='pl', PERS=1]] -> 'we'
... D[AGR=[PERS=2]] -> 'you'
... N[AGR=[NUM='sg', GND='m']] -> 'boy'
... N[AGR=[NUM='pl', GND='m']] -> 'boys'
... N[AGR=[NUM='sg', GND='f']] -> 'girl'
... N[AGR=[NUM='pl', GND='f']] -> 'girls'
... N[AGR=[NUM='sg']] -> 'student'
... N[AGR=[NUM='pl']] -> 'students'
... """
>>> grammar = grammar.FeatureGrammar.fromstring(g)
>>> tokens = 'these girls'.split()
>>> parser = parse.FeatureEarleyChartParser(grammar)
>>> trees = parser.parse(tokens)
>>> for tree in trees: print(tree)
(DP[AGR=[GND='f', NUM='pl', PERS=3]]
  (D[AGR=[NUM='pl', PERS=3]] these)
  (N[AGR=[GND='f', NUM='pl']] girls))

答案 2 :(得分:1)

使用.fcfg扩展名存储语法,并在nltk包中使用load_parser。

例如:english_grammer。 fcfg

我使用以下代码加载它。

import nltk
from nltk import load_parser
chart = load_parser('file:english_grammer.fcfg')
sent = 'the girl gave the dog a bone'.split()
trees = chart.nbest_parse(sent)
for tree in trees: print tree

为我解决问题。