这是完整的错误消息:
AttributeErrorTraceback(最近一次通话最近) 在()中 24 25#火车 ---> 26 pipe.fit(火车1,标签火车1) 27 28#测试
C:\ Users \ mcichonski \ AppData \ Local \ Continuum \ anaconda3 \ envs \ py27 \ lib \ site-packages \ sklearn \ pipeline.pyc 适合(自己,X,y,** fit_params) 246这个估算器 247“”“ -> 248 Xt,fit_params = self._fit(X,y,** fit_params) 249如果self._final_estimator不是None: 250 self._final_estimator.fit(Xt,y,** fit_params)
C:\ Users \ mcichonski \ AppData \ Local \ Continuum \ anaconda3 \ envs \ py27 \ lib \ site-packages \ sklearn \ pipeline.pyc 在_fit(self,X,y,** fit_params)中 第211章(小幸运) 212 cloned_transformer,None,Xt,y, -> 213 ** fit_params_steps [name]) 214#用合适的配件替换该步骤的变压器 215#变压器。加载变压器时这是必需的
C:\ Users \ mcichonski \ AppData \ Local \ Continuum \ anaconda3 \ envs \ py27 \ lib \ site-packages \ sklearn \ externals \ joblib \ memory.pyc 在通话中(自己,* args,** kwargs) 360 361 def call (自身,* args,** kwargs): -> 362返回self.func(* args,** kwargs) 363 364 def call_and_shelve(self,* args,** kwargs):
C:\ Users \ mcichonski \ AppData \ Local \ Continuum \ anaconda3 \ envs \ py27 \ lib \ site-packages \ sklearn \ pipeline.pyc 在_fit_transform_one中(变压器,重量,X,y,** fit_params) 579 ** fit_params): 第580章真相(五更) -> 581 res =转换器.fit_transform(X,y,** fit_params) 第582章 583 res = transformer.fit(X,y,** fit_params).transform(X)
C:\ Users \ mcichonski \ AppData \ Local \ Continuum \ anaconda3 \ envs \ py27 \ lib \ site-packages \ sklearn \ feature_extraction \ text.pyc 在fit_transform(self,raw_documents,y)中 867 868词汇,X = self._count_vocab(raw_documents, -> 869 self.fixed_vocabulary_) 870 871如果self.binary:
C:\ Users \ mcichonski \ AppData \ Local \ Continuum \ anaconda3 \ envs \ py27 \ lib \ site-packages \ sklearn \ feature_extraction \ text.pyc 在_count_vocab中(自己,raw_documents,fixed_vocab) 790 for raw_documents中的文档: 791 feature_counter = {} -> 792 for analysis(doc)中的特征: 793尝试: 794 feature_idx =词汇[功能]
C:\ Users \ mcichonski \ AppData \ Local \ Continuum \ anaconda3 \ envs \ py27 \ lib \ site-packages \ sklearn \ feature_extraction \ text.pyc 在(doc)中 264 265返回lambda doc:self._word_ngrams( -> 266 tokenize(预处理(self.decode(doc))),stop_words) 267 268其他:
C:\ Users \ mcichonski \ AppData \ Local \ Continuum \ anaconda3 \ envs \ py27 \ lib \ site-packages \ sklearn \ feature_extraction \ text.pyc 在(x) 230 第231章 -> 232 return lambda x:strip_accents(x.lower()) 233其他: 234返回strip_accents
AttributeError:'NoneType'对象没有属性'lower'
代码如下:
def printNMostInformative(vectorizer, clf, N):
feature_names = vectorizer.get_feature_names()
coefs_with_fns = sorted(zip(clf.coef_[0], feature_names))
topClass1 = coefs_with_fns[:N]
topClass2 = coefs_with_fns[:-(N + 1):-1]
print("Class 1 best: ")
for feat in topClass1:
print(feat)
print("Class 2 best: ")
for feat in topClass2:
print(feat)
vectorizer = CountVectorizer(tokenizer=tokenizeText, ngram_range=(1,1))
clf = LinearSVC()
pipe = Pipeline([('cleanText', CleanTextTransformer()), ('vectorizer', vectorizer), ('clf', clf)])
# data
train1 = train['Title'].tolist()
labelsTrain1 = train['Conference'].tolist()
test1 = test['Title'].tolist()
labelsTest1 = test['Conference'].tolist()
# train
pipe.fit(train1, labelsTrain1)
# test
preds = pipe.predict(test1)
print("accuracy:", accuracy_score(labelsTest1, preds))
print("Top 10 features used to predict: ")
printNMostInformative(vectorizer, clf, 10)
pipe = Pipeline([('cleanText', CleanTextTransformer()), ('vectorizer', vectorizer)])
transform = pipe.fit_transform(train1, labelsTrain1)
vocab = vectorizer.get_feature_names()
for i in range(len(train1)):
s = ""
indexIntoVocab = transform.indices[transform.indptr[i]:transform.indptr[i+1]]
numOccurences = transform.data[transform.indptr[i]:transform.indptr[i+1]]
for idx, num in zip(indexIntoVocab, numOccurences):
s += str((vocab[idx], num))
似乎与train1数据有关。不确定如何解决此问题。
这是在清除数据广告之后,现在尝试使用此功能来打印出最重要的功能,即具有最高系数的功能:
答案 0 :(得分:1)
对于那些寻求更多信息的人-这是基于教程的 https://towardsdatascience.com/machine-learning-for-text-classification-using-spacy-in-python-b276b4051a49。我也遇到了同样的错误:
这与cleanText()
函数有关,该函数没有返回任何要使用的管道-因此,无类型对象回溯
def cleanText(text):
text = text.strip().replace("\n", " ").replace("\r", " ")
text = text.lower()
如果您添加return text
,它应该可以解决您的错误
def cleanText(text):
text = text.strip().replace("\n", " ").replace("\r", " ")
text = text.lower()
return text