AttributeError:“ NoneType”对象没有使用spacy

时间:2018-09-21 15:35:53

标签: python spacy

这是完整的错误消息:

  

AttributeErrorTraceback(最近一次通话最近)    在()中        24        25#火车   ---> 26 pipe.fit(火车1,标签火车1)        27        28#测试

     

C:\ Users \ mcichonski \ AppData \ Local \ Continuum \ anaconda3 \ envs \ py27 \ lib \ site-packages \ sklearn \ pipeline.pyc   适合(自己,X,y,** fit_params)       246这个估算器       247“”“   -> 248 Xt,fit_params = self._fit(X,y,** fit_params)       249如果self._final_estimator不是None:       250 self._final_estimator.fit(Xt,y,** fit_params)

     

C:\ Users \ mcichonski \ AppData \ Local \ Continuum \ anaconda3 \ envs \ py27 \ lib \ site-packages \ sklearn \ pipeline.pyc   在_fit(self,X,y,** fit_params)中       第211章(小幸运)       212 cloned_transformer,None,Xt,y,   -> 213 ** fit_params_steps [name])       214#用合适的配件替换该步骤的变压器       215#变压器。加载变压器时这是必需的

     

C:\ Users \ mcichonski \ AppData \ Local \ Continuum \ anaconda3 \ envs \ py27 \ lib \ site-packages \ sklearn \ externals \ joblib \ memory.pyc   在通话中(自己,* args,** kwargs)       360       361 def call (自身,* args,** kwargs):   -> 362返回self.func(* args,** kwargs)       363       364 def call_and_shelve(self,* args,** kwargs):

     

C:\ Users \ mcichonski \ AppData \ Local \ Continuum \ anaconda3 \ envs \ py27 \ lib \ site-packages \ sklearn \ pipeline.pyc   在_fit_transform_one中(变压器,重量,X,y,** fit_params)       579 ** fit_params):       第580章真相(五更)   -> 581 res =转换器.fit_transform(X,y,** fit_params)       第582章       583 res = transformer.fit(X,y,** fit_params).transform(X)

     

C:\ Users \ mcichonski \ AppData \ Local \ Continuum \ anaconda3 \ envs \ py27 \ lib \ site-packages \ sklearn \ feature_extraction \ text.pyc   在fit_transform(self,raw_documents,y)中       867       868词汇,X = self._count_vocab(raw_documents,   -> 869 self.fixed_vocabulary_)       870       871如果self.binary:

     

C:\ Users \ mcichonski \ AppData \ Local \ Continuum \ anaconda3 \ envs \ py27 \ lib \ site-packages \ sklearn \ feature_extraction \ text.pyc   在_count_vocab中(自己,raw_documents,fixed_vocab)       790 for raw_documents中的文档:       791 feature_counter = {}   -> 792 for analysis(doc)中的特征:       793尝试:       794 feature_idx =词汇[功能]

     

C:\ Users \ mcichonski \ AppData \ Local \ Continuum \ anaconda3 \ envs \ py27 \ lib \ site-packages \ sklearn \ feature_extraction \ text.pyc   在(doc)中       264       265返回lambda doc:self._word_ngrams(   -> 266 tokenize(预处理(self.decode(doc))),stop_words)       267       268其他:

     

C:\ Users \ mcichonski \ AppData \ Local \ Continuum \ anaconda3 \ envs \ py27 \ lib \ site-packages \ sklearn \ feature_extraction \ text.pyc   在(x)       230       第231章   -> 232 return lambda x:strip_accents(x.lower())       233其他:       234返回strip_accents

     

AttributeError:'NoneType'对象没有属性'lower'

代码如下:

def printNMostInformative(vectorizer, clf, N):
    feature_names = vectorizer.get_feature_names()
    coefs_with_fns = sorted(zip(clf.coef_[0], feature_names))
    topClass1 = coefs_with_fns[:N]
    topClass2 = coefs_with_fns[:-(N + 1):-1]
    print("Class 1 best: ")
    for feat in topClass1:
        print(feat)
    print("Class 2 best: ")
    for feat in topClass2:
        print(feat)

vectorizer = CountVectorizer(tokenizer=tokenizeText, ngram_range=(1,1))
clf = LinearSVC()

pipe = Pipeline([('cleanText', CleanTextTransformer()), ('vectorizer', vectorizer), ('clf', clf)])

# data
train1 = train['Title'].tolist()
labelsTrain1 = train['Conference'].tolist()

test1 = test['Title'].tolist()
labelsTest1 = test['Conference'].tolist()

# train
pipe.fit(train1, labelsTrain1)

# test
preds = pipe.predict(test1)
print("accuracy:", accuracy_score(labelsTest1, preds))
print("Top 10 features used to predict: ")

printNMostInformative(vectorizer, clf, 10)
pipe = Pipeline([('cleanText', CleanTextTransformer()), ('vectorizer', vectorizer)])
transform = pipe.fit_transform(train1, labelsTrain1)

vocab = vectorizer.get_feature_names()
for i in range(len(train1)):
    s = ""
    indexIntoVocab = transform.indices[transform.indptr[i]:transform.indptr[i+1]]
    numOccurences = transform.data[transform.indptr[i]:transform.indptr[i+1]]
    for idx, num in zip(indexIntoVocab, numOccurences):
        s += str((vocab[idx], num))

似乎与train1数据有关。不确定如何解决此问题。

这是在清除数据广告之后,现在尝试使用此功能来打印出最重要的功能,即具有最高系数的功能:

1 个答案:

答案 0 :(得分:1)

对于那些寻求更多信息的人-这是基于教程的 https://towardsdatascience.com/machine-learning-for-text-classification-using-spacy-in-python-b276b4051a49。我也遇到了同样的错误:

这与cleanText()函数有关,该函数没有返回任何要使用的管道-因此,无类型对象回溯

def cleanText(text):
    text = text.strip().replace("\n", " ").replace("\r", " ")
    text = text.lower()

如果您添加return text,它应该可以解决您的错误

def cleanText(text):
    text = text.strip().replace("\n", " ").replace("\r", " ")
    text = text.lower()    
    return text