Question

我正在尝试使用sent_start类Token的属性Spacy。这是我的代码：

In [1]: string_doc = u"Je voudrais maitriser l'outil Spacy. C'est util pour le traitement automatique de textes."
   ...: 

In [2]: import spacy

In [3]: nlp = spacy.load('fr', disable='parser')

In [4]: doc = nlp(string_doc)

In [5]: len(doc)
Out[5]: 17

In [6]: dir(doc[0])
Out[6]: 
['_',
 '__bytes__',
 '__class__',
 '__delattr__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__pyx_vtable__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__unicode__',
 'ancestors',
 'check_flag',
 'children',
 'cluster',
 'conjuncts',
 'dep',
 'dep_',
 'doc',
 'ent_id',
 'ent_id_',
 'ent_iob',
 'ent_iob_',
 'ent_type',
 'ent_type_',
 'get_extension',
 'has_extension',
 'has_vector',
 'head',
 'i',
 'idx',
 'is_alpha',
 'is_ancestor',
 'is_ascii',
 'is_bracket',
 'is_digit',
 'is_left_punct',
 'is_lower',
 'is_oov',
 'is_punct',
 'is_quote',
 'is_right_punct',
 'is_sent_start',
 'is_space',
 'is_stop',
 'is_title',
 'is_upper',
 'lang',
 'lang_',
 'left_edge',
 'lefts',
 'lemma',
 'lemma_',
 'lex_id',
 'like_email',
 'like_num',
 'like_url',
 'lower',
 'lower_',
 'n_lefts',
 'n_rights',
 'nbor',
 'norm',
 'norm_',
 'orth',
 'orth_',
 'pos',
 'pos_',
 'prefix',
 'prefix_',
 'prob',
 'rank',
 'right_edge',
 'rights',
 'sent_start',
 'sentiment',
 'set_extension',
 'shape',
 'shape_',
 'similarity',
 'string',
 'subtree',
 'suffix',
 'suffix_',
 'tag',
 'tag_',
 'text',
 'text_with_ws',
 'vector',
 'vector_norm',
 'vocab',
 'whitespace_']


In [7]: [tok.sent_start for tok in doc]
Segmentation fault (core dumped)

最后一个命令[7]需要花费太多时间，最终ipython会以Segmentation fault消息退出。这是一个spacy错误，还是我的操作系统？我正在使用与anaconda一起安装的Fedora 26和python 2.7。

Answer 1

Spacy建议使用is_sent_start而不是sent_start（cf github.com/explosion/spaCy/blob/master/spacy/tokens/token.py‌x第351至356行）

is_sent_start按预期工作！

spacy.tokens.token.Token.sent_start没有响应

1 个答案: