使用Keras进行文本分类

时间:2018-03-07 15:23:06

标签: r keras text-classification

我开始使用R中的Keras,并希望构建一个文本分类模型。但是我遇到了一个错误,这很可能是由于我对深度学习和Keras的理解有限。任何帮助都会很棒。分享下面的代码。代码片段中的数据是有限的,因此大师可以快速重现。

library(keras)
library(tm)

data <- data.frame("Id" = 1:10, "Text" = c("the cat was mewing","the cat was black in color","the dog jumped over the wall","cat cat cat everywhere","dog dog cat play style","cat is white yet it is nice","dog is barking","cat sweet","angry dog","cat is nice nice nice"), "Label" = c(1,1,2,1,2,1,2,1,2,1))
corpus <- VCorpus(VectorSource(data$Text))
tdm <- DocumentTermMatrix(corpus, list(removePunctuation = TRUE, stopwords = TRUE,removeNumbers = TRUE))
data_t <- as.matrix(tdm)
data <- cbind(data_t,data$Label) 
dimnames(data) = NULL
#Normalize data
data[,1:(ncol(data)-1)] = normalize(data[,1:(ncol(data)-1)])
data[,ncol(data)] = as.numeric(data[,ncol(data)]) - 1
set.seed(123)
ind = sample(2,nrow(data),replace = T,prob = c(0.8,0.2))
training = data[ind==1,1:(ncol(data)-1)]
test = data[ind==2,1:(ncol(data)-1)]
traintarget = data[ind==1,ncol(data)]
testtarget = data[ind==2,ncol(data)]
# One hot encoding
trainLabels = to_categorical(traintarget)
testLabels = to_categorical(testtarget)
print(testLabels)
#Create sequential model
model = keras_model_sequential()
model %>% 
  layer_dense(units=8,activation='relu',input_shape=c(16)) 
summary(model)
model %>%
compile(loss='categorical_crossentropy',optimizer='adam',metrics='accuracy')
history = model %>%
  fit(training,
      trainLabels,
      epoch=200,
      batch_size=2,
      validation_split=0.2)

在此示例中,可能不需要热编码。除此之外,我可能还有几个地方出错了。但是,代码的最后一行给我一个错误的形状。由于我的数据中有16列,因此我将形状用作16。

我得到的错误是

py_call_impl中的错误(callable,dots $ args,dots $ keywords):   ValueError:检查目标时出错:期望dense_32有形状(None,8)但是有形状的数组(7,2)

任何指导都非常有用

1 个答案:

答案 0 :(得分:1)

这是因为您的第一层也是您的输出层。您的输出图层应具有与您尝试预测的类数相同的单位数。在这里,它有8个神经元,而你只有2个类(trainLabels有两列)。在您的情况下,您可以像这样编辑模型:

model %>% 
  layer_dense(units = 8, activation = 'relu', input_shape = 16) %>%
  layer_dense(units = 2, activation = 'softmax')