需要了解spaCy的en和en_core_web_sm模型之间的区别。
我正在尝试对Spacy执行NER。(用于组织名称) 请在下面找到我正在使用的脚本
import spacy
nlp = spacy.load("en_core_web_sm")
text = "But Google is starting from behind. The company made a late push \
into hardware, and Apple’s Siri, available on iPhones, and Amazon’s \
Alexa software, which runs on its Echo and Dot devices, have clear
leads in consumer adoption."
doc = nlp(text)
for ent in doc.ents:
print(ent.text, ent.start_char, ent.end_char, ent.label_)
以上没有提供任何输出。 但是当我使用“ en”模型
import spacy
nlp = spacy.load("en")
text = "But Google is starting from behind. The company made a late push \
into hardware, and Apple’s Siri, available on iPhones, and Amazon’s \
Alexa software, which runs on its Echo and Dot devices, have clear
leads in consumer adoption."
doc = nlp(text)
for ent in doc.ents:
print(ent.text, ent.start_char, ent.end_char, ent.label_)
它为我提供了所需的输出: Google 4 10 ORG 苹果的Siri 92104 ORG iPhone 119126 ORG 亚马逊132138 ORG 回声和点182194 ORG
这是怎么回事? 请帮忙。
我可以使用en_core_web_sm模型获得与en模型相同的输出。如果是这样,请提出建议。要求以pandas df为输入的Python 3脚本。谢谢
答案 0 :(得分:2)
因此,每个模型都是在特定语料库(文本“数据集”)之上训练的机器学习模型。这样一来,每个模型都可以用不同的标签标记条目-尤其是因为某些模型的训练数据少于其他模型。
当前,Spacy提供了4种英语模型,如https://spacy.io/models/en/
所示。根据https://github.com/explosion/spacy-models,可以通过几种不同的方式下载模型:
# download best-matching version of specific model for your spaCy installation
python -m spacy download en_core_web_sm
# out-of-the-box: download best-matching default model
python -m spacy download en
可能是,当您下载“ en”模型时,最匹配的默认模型不是“ en_core_web_sm”。
此外,请记住,这些模型会不时地更新,这可能导致您拥有同一模型的两个不同版本。
答案 1 :(得分:0)
代码:-
import spacy
nlp = spacy.load("en_core_web_sm")
text = """But Google is starting from behind. The company made a late push
into hardware, and Apple’s Siri, available on iPhones, and Amazon’s
Alexa software, which runs on its Echo and Dot devices, have clear
leads in consumer adoption."""
doc = nlp(text)
for ent in doc.ents:
print(ent.text, ent.start_char, ent.end_char, ent.label_)
import spacy
nlp = spacy.load("en")
text = """But Google is starting from behind. The company made a late push \
into hardware, and Apple’s Siri, available on iPhones, and Amazon’s \
Alexa software, which runs on its Echo and Dot devices, have clear
leads in consumer adoption."""
doc = nlp(text)
for ent in doc.ents:
print(ent.text, ent.start_char, ent.end_char, ent.label_)
答案 2 :(得分:0)
加载spacy.load('en_core_web_sm')
而非spacy.load('en')
应该会有所帮助。