train.txt：

Question

我已经发布了样本火车数据以及测试数据以及代码。我正在尝试使用朴素贝叶斯算法来训练模型。

但是，在评论中，我得到了清单列表。因此，我认为我的代码因以下错误而失败：

return lambda x: strip_accents(x.lower())
AttributeError: 'list' object has no attribute 'lower'

与我刚接触python一样，你们中的任何人都可以帮我吗？

train.txt：

review,label
Colors & clarity is superb,positive
Sadly the picture is not nearly as clear or bright as my 40 inch Samsung,negative

test.txt：

review,label
The picture is clear and beautiful,positive
Picture is not clear,negative

我的代码：

from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import confusion_matrix
from sklearn.feature_extraction.text import CountVectorizer

def load_data(filename):

    reviews = list()
    labels = list()
    with open(filename) as file:
        file.readline()
        for line in file:
            line = line.strip().split(',')
            labels.append(line[1])
            reviews.append(line[0].split())


    return reviews, labels

X_train, y_train = load_data('/Users/7000015504/Desktop/Sep_10/sample_train.csv')
X_test, y_test = load_data('/Users/7000015504/Desktop/Sep_10/sample_test.csv')


clf = CountVectorizer()
X_train_one_hot =  clf.fit(X_train)
X_test_one_hot = clf.transform(X_test)

bnbc = BernoulliNB(binarize=None)
bnbc.fit(X_train_one_hot, y_train)

score = bnbc.score(X_test_one_hot, y_test)
print("score of Naive Bayes algo is :" , score)

Answer 1

我对您的代码进行了一些修改。下面发布的一个有效；我添加了有关如何调试上面发布的注释的评论。

# These three will not used, do not import them
# from sklearn.preprocessing import MultiLabelBinarizer 
# from sklearn.model_selection import train_test_split 
# from sklearn.metrics import confusion_matrix

# This performs the classification task that you want with your input data in the format provided
from sklearn.naive_bayes import MultinomialNB 

from sklearn.feature_extraction.text import CountVectorizer

def load_data(filename):
    """ This function works, but you have to modify the second-to-last line from
    reviews.append(line[0].split()) to reviews.append(line[0]).
    CountVectorizer will perform the splits by itself as it sees fit, trust him :)"""
    reviews = list()
    labels = list()
    with open(filename) as file:
        file.readline()
        for line in file:
            line = line.strip().split(',')
            labels.append(line[1])
            reviews.append(line[0])

    return reviews, labels

X_train, y_train = load_data('train.txt')
X_test, y_test = load_data('test.txt')

vec = CountVectorizer() 
# Notice: clf means classifier, not vectorizer. 
# While it is syntactically correct, it's bad practice to give misleading names to your objects. 
# Replace "clf" with "vec" or something similar.

# Important! you called only the fit method, but did not transform the data 
# afterwards. The fit method does not return the transformed data by itself. You 
# either have to call .fit() and then .transform() on your training data, or just fit_transform() once.

X_train_transformed =  vec.fit_transform(X_train) 

X_test_transformed = vec.transform(X_test)

clf= MultinomialNB()
clf.fit(X_train_transformed, y_train)

score = clf.score(X_test_transformed, y_test)
print("score of Naive Bayes algo is :" , score)

此代码的输出是：

score of Naive Bayes algo is : 0.5

Answer 2

您需要遍历列表中的每个元素。

for item in list():
      item = item.lower()

注意：仅当您遍历字符串列表（dtype = str）时适用。

面对AttributeError：“ list”对象没有属性“ lower”

train.txt：

test.txt：

我的代码：

2 个答案: