Spacy EN模型问题

时间:2019-06-04 14:59:06

标签: spacy

需要了解spaCy的en和en_core_web_sm模型之间的区别。

我正在尝试对Spacy执行NER。(用于组织名称) 请在下面找到我正在使用的脚本

import spacy
nlp = spacy.load("en_core_web_sm")
text = "But Google is starting from behind. The company made a late push \
    into hardware, and Apple’s Siri, available on iPhones, and Amazon’s \ 
    Alexa software, which runs on its Echo and Dot devices, have clear 
    leads in consumer adoption."
doc = nlp(text)
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

以上没有提供任何输出。 但是当我使用“ en”模型

import spacy
nlp = spacy.load("en")
text = "But Google is starting from behind. The company made a late push \
    into hardware, and Apple’s Siri, available on iPhones, and Amazon’s \
    Alexa software, which runs on its Echo and Dot devices, have clear 
    leads in consumer adoption."
doc = nlp(text)
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

它为我提供了所需的输出: Google 4 10 ORG 苹果的Siri 92104 ORG iPhone 119126 ORG 亚马逊132138 ORG 回声和点182194 ORG

这是怎么回事? 请帮忙。

我可以使用en_core_web_sm模型获得与en模型相同的输出。如果是这样,请提出建议。要求以pandas df为输入的Python 3脚本。谢谢

3 个答案:

答案 0 :(得分:2)

因此,每个模型都是在特定语料库(文本“数据集”)之上训练的机器学习模型。这样一来,每个模型都可以用不同的标签标记条目-尤其是因为某些模型的训练数据少于其他模型。

当前,Spacy提供了4种英语模型,如https://spacy.io/models/en/

所示。

根据https://github.com/explosion/spacy-models,可以通过几种不同的方式下载模型:

# download best-matching version of specific model for your spaCy installation
python -m spacy download en_core_web_sm

# out-of-the-box: download best-matching default model
python -m spacy download en

可能是,当您下载“ en”模型时,最匹配的默认模型不是“ en_core_web_sm”。

此外,请记住,这些模型会不时地更新,这可能导致您拥有同一模型的两个不同版本。

答案 1 :(得分:0)

在我的系统中,两种情况下的结果都是相同的enter image description here

代码:-

import spacy
nlp = spacy.load("en_core_web_sm")
text = """But Google is starting from behind. The company made a late push 
into hardware, and Apple’s Siri, available on iPhones, and Amazon’s  
Alexa software, which runs on its Echo and Dot devices, have clear 
leads in consumer adoption."""
doc = nlp(text)
for ent in doc.ents:
   print(ent.text, ent.start_char, ent.end_char, ent.label_)

import spacy
nlp = spacy.load("en")
text = """But Google is starting from behind. The company made a late push \
into hardware, and Apple’s Siri, available on iPhones, and Amazon’s \
Alexa software, which runs on its Echo and Dot devices, have clear 
leads in consumer adoption."""
doc = nlp(text)
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

答案 2 :(得分:0)

加载spacy.load('en_core_web_sm')而非spacy.load('en')应该会有所帮助。