神经网络处理字符数据

时间:2017-06-18 06:30:32

标签: r neural-network

我正在使用神经网络包库(neuralnet)进行情绪分析的基本实验

the structure of my data is as follows:
'data.frame':   4442 obs. of  2 variables:
 $ comment_text: chr  "really briliant app\tit's intuitive and informative giving all the information you could need and seemingly very accurate." "will not connect to gps\tapp does not connect to gps no matter how long i have it on. i have gps set on high ac"| __truncated__ "wish this would interest more with google now to provide weekly or monthly summaries." "useless\tdoes not talk to gps on the phone. 20 minute run no data." ...
 $ rating      : int  5 1 5 1 4 5 4 3 4 5 ...

我将这些数据转移到训练和测试部分并运行神经网络预测,如下所示:

senti_train <- nnsenti[1:3499, ]
senti_test <- nnsenti[3500:4443, ]
library(neuralnet)
neuralmodel <- neuralnet(rating ~ comment_text, data=senti_train)
plot(neuralmodel)

运行后,它给了我这个错误

Error in neurons[[i]] %*% weights[[i]] : 
requires numeric/complex matrix/vector arguments

如何解决这个问题,因为文本是重要的部分

我已经对文本数据进行了标记,使用tm包完成了一些文本清理并更新了我的代码,如下所示:

nnsenti$comment_text <- VCorpus(VectorSource(nnsenti$comment_text))


#Text Cleaning
nnsenti$comment_text <- tm_map(nnsenti$comment_text,content_transformer(tolower))
nnsenti$comment_text <- tm_map(nnsenti$comment_text, removeNumbers)
nnsenti$comment_text <- tm_map(nnsenti$comment_text, removePunctuation)
nnsenti$comment_text <- tm_map(nnsenti$comment_text, removeWords,stopwords('english'))
nnsenti$comment_text <- tm_map(nnsenti$comment_text, removeWords,c('please','sad')) #Additional words
nnsenti$comment_text <- tm_map(nnsenti$comment_text, stripWhitespace)
senti_train <- nnsenti[1:3499, ]
senti_test <- nnsenti[3500:4443, ]

library(neuralnet)
neuralmodel <- neuralnet(rating ~ comment_text, data=senti_train)

现在我收到此错误

Error in model.frame.default(formula.reverse, data) : 
  invalid type (list) for variable 'comment_text'

1 个答案:

答案 0 :(得分:0)

您似乎没有对数据进行标准化。您的数据至少应该以数字方式输入神经网络,甚至在某个范围之间(主要是-1,10,1)更好。

您可以使用one-hot encoding对文字进行规范化。通过将值除以某个最大值来标准化值(如评级)。最大评级= 10,因此将所有评级除以10。