Question

使用spaCy，我试图创建一个句子的DataFrame来显示每个标记属性。例如：

当前每个句子是一个字典对象，而每一行都是一个列表，但是，这对于大型文档来说非常耗费内存，并降低了性能。因此，首选选项是使用生成器对象，但是，使用DataFrame显示字典对象时，会返回以下错误：“ TypeError：类型'generator'的对象没有len（）数据帧”

关于如何使用生成器显示句子的任何建议将不胜感激。谢谢。

在工作版本中，以下生成器对象已替换为列表理解。例如'index'：[word.i表示句子中的单词]

with open(document.txt) as text:
    doc = text.read()

## text = Tim Berners-Lee invented the World Wide Web

new_doc=nlp(doc)

for i, sentence in tqdm.tqdm(enumerate(new_doc.sents), total = 
len(list(new_doc.sents))):

    sent_dic[i] = {'index' : (word.i for word in sentence),
                   'sentence' : (word.list for word in sentence),
                   'LEMMA' : (word.lemma_ for word in sentence),
                   'POS' : (word.pos_ for word in sentence),
                   'TAG' : (word.tag_ for word in sentence),
                   'DEP' : (word.dep_ for word in sentence),
                   'ENT_TYPE' : (word.ent_type for word in sentence),
                  }

display(pd.DataFrame.from_dict(sent_dic[i]).T)

显示Generator对象的DataFrame

0 个答案: