AttributeError:'int'对象在TFIDF和CountVectorizer中没有属性“ lower”

时间:2018-12-31 10:03:35

标签: python machine-learning scikit-learn tf-idf

我试图预测输入消息的不同类别,并且我使用波斯语。我使用Tfidf和Naive-Bayes对输入数据进行分类。这是我的代码:

 InvalidArgumentError: You must feed a value for placeholder tensor 'conv2d_1_input' with dtype float and shape [?,48,48,1]
 [[{{node conv2d_1_input}} = Placeholder[dtype=DT_FLOAT, shape=[?,48,48,1], _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
 [[{{node dense_2/bias/read/_407}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_335_dense_2/bias/read", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]InvalidArgumentError: You must feed a value for placeholder tensor 'conv2d_1_input' with dtype float and shape [?,48,48,1]
 [[{{node conv2d_1_input}} = Placeholder[dtype=DT_FLOAT, shape=[?,48,48,1], _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
 [[{{node dense_2/bias/read/_407}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_335_dense_2/bias/read", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

但是当我运行上面的代码时,它会抛出以下异常,同时我希望在输出中提供“ ads”类:

  

回溯(最近一次通话最后一次):文件“ ... / multiclass-main.py”,   第27行,在       X_train_counts = cv.fit_transform(X_train)文件“ ... \ sklearn \ feature_extraction \ text.py”,行1012,在fit_transform中       self.fixed_vocabulary_)文件“ ... sklearn \ feature_extraction \ text.py”,第922行,在_count_vocab中       用于analyze(doc)中的特征:文件“ ... sklearn \ feature_extraction \ text.py”,行308,在       tokenize(preprocess(self.decode(doc))),stop_words)文件“ ... sklearn \ feature_extraction \ text.py”,第256行,在       返回lambda x:strip_accents(x.lower())AttributeError:'int'对象没有属性'lower'

在该项目中如何使用Tfidf和CountVectorizer?

2 个答案:

答案 0 :(得分:3)

如您所见,错误是AttributeError: 'int' object has no attribute 'lower',这意味着整数不能小写。在代码中的某个地方,它试图将小写的小数对象变为小写。

为什么会这样?

CountVectorizer构造函数的参数lowercase默认为True。当您调用.fit_transform()时,它将尝试小写包含整数的输入。更具体地说,在输入数据中,您有一个作为整数对象的项目。例如,您的列表包含类似于以下内容的数据:

 corpus = ['sentence1', 'sentence 2', 12930, 'sentence 100']

当您将以上列表传递给CountVectorizer时,会引发此类异常。

如何解决?

以下是一些可以避免此问题的解决方案:

1)将语料库中的所有行转换为字符串对象。

 corpus = ['sentence1', 'sentence 2', 12930, 'sentence 100']
 corpus = [str (item) for item in corpus]

2)删除语料库中的整数:

corpus = ['sentence1', 'sentence 2', 12930, 'sentence 100']
corpus = [item for item in corpus if not isinstance(item, int)]

答案 1 :(得分:0)

您可以设置lowercase = False

cv = CountVectorizer(lowercase=False)