Question

这是完整的错误消息：

AttributeErrorTraceback（最近一次通话最近）    在（）中        24        25＃火车   ---> 26 pipe.fit（火车1，标签火车1）        27        28＃测试

C：\ Users \ mcichonski \ AppData \ Local \ Continuum \ anaconda3 \ envs \ py27 \ lib \ site-packages \ sklearn \ pipeline.pyc   适合（自己，X，y，** fit_params）       246这个估算器       247“”“   -> 248 Xt，fit_params = self._fit（X，y，** fit_params）       249如果self._final_estimator不是None：       250 self._final_estimator.fit（Xt，y，** fit_params）

C：\ Users \ mcichonski \ AppData \ Local \ Continuum \ anaconda3 \ envs \ py27 \ lib \ site-packages \ sklearn \ pipeline.pyc   在_fit（self，X，y，** fit_params）中       第211章（小幸运）       212 cloned_transformer，None，Xt，y，   -> 213 ** fit_params_steps [name]）       214＃用合适的配件替换该步骤的变压器       215＃变压器。加载变压器时这是必需的

C：\ Users \ mcichonski \ AppData \ Local \ Continuum \ anaconda3 \ envs \ py27 \ lib \ site-packages \ sklearn \ externals \ joblib \ memory.pyc   在通话中（自己，* args，** kwargs）       360       361 def call （自身，* args，** kwargs）：   -> 362返回self.func（* args，** kwargs）       363       364 def call_and_shelve（self，* args，** kwargs）：

C：\ Users \ mcichonski \ AppData \ Local \ Continuum \ anaconda3 \ envs \ py27 \ lib \ site-packages \ sklearn \ pipeline.pyc   在_fit_transform_one中（变压器，重量，X，y，** fit_params）       579 ** fit_params）：       第580章真相（五更）   -> 581 res =转换器.fit_transform（X，y，** fit_params）       第582章       583 res = transformer.fit（X，y，** fit_params）.transform（X）

C：\ Users \ mcichonski \ AppData \ Local \ Continuum \ anaconda3 \ envs \ py27 \ lib \ site-packages \ sklearn \ feature_extraction \ text.pyc   在fit_transform（self，raw_documents，y）中       867       868词汇，X = self._count_vocab（raw_documents，   -> 869 self.fixed_vocabulary_）       870       871如果self.binary：

C：\ Users \ mcichonski \ AppData \ Local \ Continuum \ anaconda3 \ envs \ py27 \ lib \ site-packages \ sklearn \ feature_extraction \ text.pyc   在_count_vocab中（自己，raw_documents，fixed_vocab）       790 for raw_documents中的文档：       791 feature_counter = {}   -> 792 for analysis（doc）中的特征：       793尝试：       794 feature_idx =词汇[功能]

C：\ Users \ mcichonski \ AppData \ Local \ Continuum \ anaconda3 \ envs \ py27 \ lib \ site-packages \ sklearn \ feature_extraction \ text.pyc   在（doc）中       264       265返回lambda doc：self._word_ngrams（   -> 266 tokenize（预处理（self.decode（doc））），stop_words）       267       268其他：

C：\ Users \ mcichonski \ AppData \ Local \ Continuum \ anaconda3 \ envs \ py27 \ lib \ site-packages \ sklearn \ feature_extraction \ text.pyc   在（x）       230       第231章   -> 232 return lambda x：strip_accents（x.lower（））       233其他：       234返回strip_accents

AttributeError：'NoneType'对象没有属性'lower'

代码如下：

def printNMostInformative(vectorizer, clf, N):
    feature_names = vectorizer.get_feature_names()
    coefs_with_fns = sorted(zip(clf.coef_[0], feature_names))
    topClass1 = coefs_with_fns[:N]
    topClass2 = coefs_with_fns[:-(N + 1):-1]
    print("Class 1 best: ")
    for feat in topClass1:
        print(feat)
    print("Class 2 best: ")
    for feat in topClass2:
        print(feat)

vectorizer = CountVectorizer(tokenizer=tokenizeText, ngram_range=(1,1))
clf = LinearSVC()

pipe = Pipeline([('cleanText', CleanTextTransformer()), ('vectorizer', vectorizer), ('clf', clf)])

# data
train1 = train['Title'].tolist()
labelsTrain1 = train['Conference'].tolist()

test1 = test['Title'].tolist()
labelsTest1 = test['Conference'].tolist()

# train
pipe.fit(train1, labelsTrain1)

# test
preds = pipe.predict(test1)
print("accuracy:", accuracy_score(labelsTest1, preds))
print("Top 10 features used to predict: ")

printNMostInformative(vectorizer, clf, 10)
pipe = Pipeline([('cleanText', CleanTextTransformer()), ('vectorizer', vectorizer)])
transform = pipe.fit_transform(train1, labelsTrain1)

vocab = vectorizer.get_feature_names()
for i in range(len(train1)):
    s = ""
    indexIntoVocab = transform.indices[transform.indptr[i]:transform.indptr[i+1]]
    numOccurences = transform.data[transform.indptr[i]:transform.indptr[i+1]]
    for idx, num in zip(indexIntoVocab, numOccurences):
        s += str((vocab[idx], num))

似乎与train1数据有关。不确定如何解决此问题。

这是在清除数据广告之后，现在尝试使用此功能来打印出最重要的功能，即具有最高系数的功能：

Answer 1

对于那些寻求更多信息的人-这是基于教程的 https://towardsdatascience.com/machine-learning-for-text-classification-using-spacy-in-python-b276b4051a49。我也遇到了同样的错误：

这与cleanText()函数有关，该函数没有返回任何要使用的管道-因此，无类型对象回溯

def cleanText(text):
    text = text.strip().replace("\n", " ").replace("\r", " ")
    text = text.lower()

如果您添加return text，它应该可以解决您的错误

def cleanText(text):
    text = text.strip().replace("\n", " ").replace("\r", " ")
    text = text.lower()    
    return text

AttributeError：“ NoneType”对象没有使用spacy

1 个答案: