Spacy:为什么使用管道时无法使用向量?

时间:2019-01-11 11:09:30

标签: spacy

对于一个非常具体的案例,这是一个非常具体的问题。我希望Stackoverflow仍然是问这个问题的正确地方。

使用nlp.pipe(docs)nlp(doc)的结果似乎有所不同。当我使用前者而不是后者时,在某些情况下我不会收到向量。这是我注意到的:

  • 当将默认模型(en_core_web_sm)与disabled标记器和解析器一起使用时,我得到的词向量是nlp(doc)而不是nlp.pipe(docs)
  • 较大的模型(即en_core_web_lg)在两种情况下都会生成向量
  • 重新启用标记器还可以“修复”默认模型
  • 除了矢量之外,禁用标记器和解析器似乎在两种情况下都可以正常工作

有人可以向我解释吗?特别是为什么标记器似乎会对此产生影响?

代码:

import spacy


documents = [
    "spaCy is a free, open-source library for advanced Natural Language Processing (NLP) in Python.",
    "If you're working with a lot of text, you'll eventually want to know more about it. For example, what's it about? What do the words mean in context? Who is doing what to whom? What companies and products are mentioned? Which texts are similar to each other?",
    "spaCy is designed specifically for production use and helps you build applications that process and \"understand\" large volumes of text. It can be used to build information extraction or natural language understanding systems, or to pre-process text for deep learning."
]

nlp = spacy.load('en', disable=['tagger', 'parser'])
print('classic model, disabled tagger and parser, without pipe')
for i, doc in enumerate(documents):
    spacy_doc = nlp(doc)
    print(f'document {i} has_vector: {spacy_doc.has_vector}, is_parsed: {spacy_doc.is_parsed}, token_0_pos: {spacy_doc[0].pos_}')

print('classic model, disabled tagger and parser, with pipe')
for i, spacy_doc in enumerate(nlp.pipe(documents)):
    print(f'document {i} has_vector: {spacy_doc.has_vector}, is_parsed: {spacy_doc.is_parsed}, token_0_pos: {spacy_doc[0].pos_}')

nlp = spacy.load('en', disable=['parser'])
print('classic model, disabled only parser, without pipe')
for i, doc in enumerate(documents):
    spacy_doc = nlp(doc)
    print(f'document {i} has_vector: {spacy_doc.has_vector}, is_parsed: {spacy_doc.is_parsed}, token_0_pos: {spacy_doc[0].pos_}')

print('classic model, disabled only parser, with pipe')
for i, spacy_doc in enumerate(nlp.pipe(documents)):
    print(f'document {i} has_vector: {spacy_doc.has_vector}, is_parsed: {spacy_doc.is_parsed}, token_0_pos: {spacy_doc[0].pos_}')

nlp = spacy.load('en', disable=['tagger'])
print('classic model, disabled only tagger, without pipe')
for i, doc in enumerate(documents):
    spacy_doc = nlp(doc)
    print(f'document {i} has_vector: {spacy_doc.has_vector}, is_parsed: {spacy_doc.is_parsed}, token_0_pos: {spacy_doc[0].pos_}')

print('classic model, disabled only tagger, with pipe')
for i, spacy_doc in enumerate(nlp.pipe(documents)):
    print(f'document {i} has_vector: {spacy_doc.has_vector}, is_parsed: {spacy_doc.is_parsed}, token_0_pos: {spacy_doc[0].pos_}')


nlp = spacy.load('en_core_web_lg', disable=['tagger', 'parser'])
print('model en_core_web_lg, disabled tagger and parser, with pipe')
for i, spacy_doc in enumerate(nlp.pipe(documents)):
    print(f'document {i} has_vector: {spacy_doc.has_vector}, is_parsed: {spacy_doc.is_parsed}, token_0_pos: {spacy_doc[0].pos_}')

输出:

classic model, disabled tagger and parser, without pipe
document 0 has_vector: True, is_parsed: False, token_0_pos: 
document 1 has_vector: True, is_parsed: False, token_0_pos: 
document 2 has_vector: True, is_parsed: False, token_0_pos: 
classic model, disabled tagger and parser, with pipe
document 0 has_vector: False, is_parsed: False, token_0_pos: 
document 1 has_vector: False, is_parsed: False, token_0_pos: 
document 2 has_vector: False, is_parsed: False, token_0_pos: 
classic model, disabled only parser, without pipe
document 0 has_vector: True, is_parsed: False, token_0_pos: ADJ
document 1 has_vector: True, is_parsed: False, token_0_pos: ADP
document 2 has_vector: True, is_parsed: False, token_0_pos: ADJ
classic model, disabled only parser, with pipe
document 0 has_vector: True, is_parsed: False, token_0_pos: ADJ
document 1 has_vector: True, is_parsed: False, token_0_pos: ADP
document 2 has_vector: True, is_parsed: False, token_0_pos: ADJ
classic model, disabled only tagger, without pipe
document 0 has_vector: True, is_parsed: True, token_0_pos: 
document 1 has_vector: True, is_parsed: True, token_0_pos: 
document 2 has_vector: True, is_parsed: True, token_0_pos: 
classic model, disabled only tagger, with pipe
document 0 has_vector: False, is_parsed: True, token_0_pos: 
document 1 has_vector: False, is_parsed: True, token_0_pos: 
document 2 has_vector: False, is_parsed: True, token_0_pos: 
model en_core_web_lg, disabled tagger and parser, with pipe
document 0 has_vector: True, is_parsed: False, token_0_pos: 
document 1 has_vector: True, is_parsed: False, token_0_pos: 
document 2 has_vector: True, is_parsed: False, token_0_pos:

0 个答案:

没有答案