我正在尝试使用pandas DataFrame中的StanfordDependencyParser。
from nltk.parse import stanford
import pandas as pd
dep_parser=stanford.StanfordDependencyParser()
df = pd.DataFrame({'ID' : [0,1,2], 'sentence' : ['This is the first s.', 'This is the 2nd s.', 'This isn''t the third s.']})
df['parsed'] = df.sentence.apply(dep_parser.raw_parse)
print(df)
ID sentence parsed
0 0 This is the first s. <list_iterator object at 0x000000000E849C18>
1 1 This is the 2nd s. <list_iterator object at 0x000000000E8691D0>
2 2 This isnt the third s. <list_iterator object at 0x000000000E8696A0>
但我更喜欢DataFrame列中的依赖图的文本表示而不是迭代器,如下所示:
ID sentence parsed
0 0 This is the first s. [[(('s.', 'NN'), 'nsubj', ('This', 'DT')),(('s.', 'NN'), 'cop', ('is', 'VBZ')), (('s.', 'NN'), 'det', ('the', 'DT')),(('s.', 'NN'), 'amod', ('first', 'JJ'))]]
...
我尝试按照plas中的步骤操作nltk文档,但会导致属性错误:
df['dep'] = [list(parse.triples()) for parse in df.parsed]
AttributeError: 'list_iterator' object has no attribute 'triples'
有没有办法解压缩在DataFrame中显示为值的迭代器?欢迎任何帮助。
答案 0 :(得分:1)
list_iterator
是一种根据需要生成列表的机制&#34;。它确实没有方法triples()
,但它在你的情况下产生的列表确实是一个三元组列表:
df['dep'] = [list(parse) for parse in df['parsed']]