Question

我真的是编程和python的新手，我一直在尝试在python 3.x中使用SpaCy。但是，当我尝试将.pos_应用于文本以查找词性时，对于词性我没有任何结果。我确保已正确安装SpaCy并浏览了其他Stackoverflow帖子和此one github帖子，但是它没有帮助。

这是我使用的代码：

from spacy.lang.en import English
parser = English()

tokens = parser('She ran')
dir(tokens[0])
print(dir(tokens[0]))


def show_POS(text):
    tokens = parser(text)
    for token in tokens:
       print(token.text, token.pos_)


show_POS("She hit the wall.")


def show_dep(text):
    tokens = parser(text)
    for token in tokens:
        print(" {} : {} : {} :{}".format(token.orth_,token.pos_,token.dep_,token.head))


print("token : POS : dep. : head")
print("-------------------------")
show_dep("She hit the wall.")

ex1 = parser("he drinks a water")
for word in ex1:
print(word.text,word.pos_)

这是输出：

/Users/dalals4/PycharmProjects/NLP-LEARNING/venv/bin/python 
/Users/dalals4/PycharmProjects/NLP_learning_practice_chp5.py
['_', '__bytes__', '__class__', '__delattr__', '__dir__', '__doc__', 
'__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', 
'__hash__', '__init__', '__init_subclass__', '__le__', '__len__', 
'__lt__', '__ne__', '__new__', '__pyx_vtable__', '__reduce__', 
'__reduce_ex__', '__repr__', '__setattr__', '__setstate__', 
'__sizeof__', '__str__', '__subclasshook__', '__unicode__', 
'ancestors', 'check_flag', 'children', 'cluster', 'conjuncts', 'dep', 
'dep_', 'doc', 'ent_id', 'ent_id_', 'ent_iob', 'ent_iob_', 'ent_type', 
'ent_type_', 'get_extension', 'has_extension', 'has_vector', 'head', 
'i', 'idx', 'is_alpha', 'is_ancestor', 'is_ascii', 'is_bracket', 
'is_currency', 'is_digit', 'is_left_punct', 'is_lower', 'is_oov', 
'is_punct', 'is_quote', 'is_right_punct', 'is_sent_start', 'is_space', 
'is_stop', 'is_title', 'is_upper', 'lang', 'lang_', 'left_edge', 
'lefts', 'lemma', 'lemma_', 'lex_id', 'like_email', 'like_num', 
'like_url', 'lower', 'lower_', 'n_lefts', 'n_rights', 'nbor', 'norm', 
'norm_', 'orth', 'orth_', 'pos', 'pos_', 'prefix', 'prefix_', 'prob', 
'rank', 'right_edge', 'rights', 'sent_start', 'sentiment', 
'set_extension', 'shape', 'shape_', 'similarity', 'string', 'subtree', 
'suffix', 'suffix_', 'tag', 'tag_', 'text', 'text_with_ws', 'vector', 
'vector_norm', 'vocab', 'whitespace_']
She 
hit 
the 
wall 
. 
token : POS : dep. : head
-------------------------
 She :  :  : She
 hit :  :  : hit
 the :  :  : the
 wall :  :  : wall
 . :  :  : .
he 
drinks 
a 
water 

Process finished with exit code 0

任何帮助将不胜感激！提前非常感谢:)

Answer 1

这里的问题是，您仅导入英语语言类，其中包括特定于语言的数据，例如标记化规则。但是您实际上并没有加载模型，这使spaCy可以预测词性标签和其他语言注释。

如果您尚未这样做，则首先需要installed a model package，例如小型英语模型：

python -m spacy download en_core_web_sm

然后您可以通过调用spacy.load告诉spaCy加载它：

import spacy

nlp = spacy.load('en_core_web_sm')
doc = nlp(u"she ran")
for token in doc:
    print(token.text, token.pos_)

这将为您提供一个English类的实例，其中已加载了模型权重，因此spaCy可以预测词性标签，依赖项标签和命名实体。

如果您不熟悉spaCy，建议您在文档中查看the spaCy 101 guide。它解释了最重要的概念，并包括许多可以运行的示例。

SpaCy中的.pos_在Python中不返回任何结果

1 个答案: