我从nltk获取树结构,在访问树值时,我得到的结果如下:
(NE Stallone/NNP)
('jason', 'NN')
("'s", 'POS')
('film', 'NN')
(NE Rocky/NNP)
('was', 'VBD')
('inducted', 'VBN')
('into', 'IN')
('the', 'DT')
(NE National/NNP Film/NNP Registry/NNP)
('as', 'IN')
('well', 'RB')
('as', 'IN')
('having', 'VBG')
('its', 'PRP$')
('film', 'NN')
('props', 'NNS')
('placed', 'VBN')
('in', 'IN')
('the', 'DT')
(NE Smithsonian/NNP Museum/NNP)
('.', '.')
如何仅检索NN
,VBN
的值?
我试过这种方式:
text = "Stallone jason's film Rocky was inducted into the National Film Registry as well as having its film props placed in the Smithsonian Museum."
tokenized = nltk.word_tokenize(text)
tagged = nltk.pos_tag(tokenized)
namedEnt = nltk.ne_chunk(tagged, binary = True)
print namedEnt
np = [' '.join([y[0] for y in x.leaves()]) for x in namedEnt if x == "NN"]
for x in namedEnt:
if x[0] == 'NN':
print x[1]
np = [' '.join([y[0] for y in x.leaves()]) for x in namedEnt if x == "NN"]
正确地给了我NE标签但是无法单独获得NN,NNP,NNS。如果有其他方法可以告诉我。
答案 0 :(得分:1)
好像你必须在键/值查找中做一个小的交换。此外,您必须考虑元组具有try / except的单个值的情况。这是一个小方法,允许您从树中检索所需的值:
def values_for(tree, tag):
ret = []
for x in tree:
try:
if x[1] == tag:
ret.append(x[0])
except IndexError, e:
pass
return ret
然后你应该能够过滤你想要的节点:
>>> text = "Stallone jason's film Rocky was inducted into the National Film Registry as well as having its film props placed in the Smithsonian Museum."
>>> tokenized = nltk.word_tokenize(text)
>>> tagged = nltk.pos_tag(tokenized)
>>> namedEnt = nltk.ne_chunk(tagged, binary = True)
>>> values_for(namedEnt, 'NN')
['jason', 'film', 'film']
>>> values_for(namedEnt, 'VBN')
['inducted', 'placed']
>>> values_for(namedEnt, 'NNP')
[]
>>> values_for(namedEnt, 'NNS')
['props']
希望这会有所帮助。干杯!