Question

我正在尝试为下面的句子找到命名实体

import spacy.lang.en
parser = spacy.lang.en.English()
ParsedSentence = parser(u"Alphabet is a new startup in China")
for Entity in  ParsedSentence.ents:  
    print (Entity.label, Entity.label_, ' '.join(t.orth_ for t in Entity))

我期待得到结果＆＃34; Alphabet＆＃34;，＆＃34; China＆＃34;但结果我得到一个空集。我在这做错了什么

Answer 1

根据名称实体识别的spacy documentation，这里是提取名称实体的方法

import spacy
nlp = spacy.load('en') # install 'en' model (python3 -m spacy download en)
doc = nlp("Alphabet is a new startup in China")
print('Name Entity: {0}'.format(doc.ents))

<强>结果
Name Entity: (China,)
对于＆＃34; Alphabet＆＃34;被识别为公司名称添加＆＃34;＆＃34;在它之前，它将被识别为名词

doc = nlp("The Alphabet is a new startup in China")
print('Name Entity: {0}'.format(doc.ents))

Name Entity: (Alphabet, China)

Answer 2

在 Spacy 版本 3 中，Hugging Face 中的 Transformers 对之前版本中提供的操作进行了微调，但效果更好。

<块引用>

Transformers 目前（2020 年）是自然语言处理领域的最新技术，即通常我们有（one-hot-encode -> word2vec -> glove | fast text）然后（循环神经网络，递归神经网络，门控循环单元、长短期记忆、双向长期短期记忆等）和现在的 Transformers + Attention（BERT、RoBERTa、XLNet、XLM、CTRL、AlBERT、T5、Bart、GPT、GPT-2、GPT -3) - 这只是给出“为什么”你应该考虑 Transformers 的背景，我知道有很多我没有提到的东西，比如 Fuzz、知识图等

安装依赖：

sudo apt install libncurses5

pip install spacy-transformers --pre -f https://download.pytorch.org/whl/torch_stable.html

pip install spacy-nightly # I'm using 3.0.0rc2

下载模型：

python -m spacy download en_core_web_trf # English Transformer pipeline, Roberta base

这里有 list 的可用模型。

然后像往常一样使用它：

import spacy


text = 'Type something here which can be related to something, e.g Stack Over Flow organization'

nlp = spacy.load('en_core_web_trf')

document = nlp(text)

print(document.ents)

参考文献：

了解Transformers and Attention。

阅读有关不同 Trasnformers architectures 的摘要。

了解 Spacy 所做的 Transformers fine-tune。

Spacy中的命名实体识别

2 个答案: