我有一个看起来像的数据框
Index Text
0 When can I go to Canada?
1 Who is king Arthur?
2 Can you give me the email of Norton?
我使用Spacy尝试从数据框的每一行提取名称,以使输出看起来像这样
Index Text Name
0 When can I go to Canada?
1 Who is king Arthur? Arthur
2 Can you give me the email of Norton? Norton
我使用以下代码取得了一定的成功
df['Name'] = [nlp(x).ents for x in df['Text']]
但是它会输出各种各样的实体,而不仅是名称,例如,我也会得到'Canada'作为输出,这是我不想要的。所以,我修改了代码
df['Name'] = [token.label_ for token in nlp(x).ents for x in df['Text']]
但是突然我得到了错误
NameError: name 'x' is not defined
为什么列表理解不起作用? 按照此处的代码示例https://spacy.io/
答案 0 :(得分:1)
尝试一下。
import spacy
nlp = spacy.load("en_core_web_sm")
def get_persons( text ):
good_ents = {"PERSON" , "ORG"}
doc = nlp(text)
persons = [i.text for i in doc.ents if i.label_ in good_ents]
return persons
df["name"] = df.apply( lambda x : get_persons( x["Text"] ) , axis = 1 )