我需要在R语言中创建一个函数,它可以将句子切换成单词,然后这些单词与pos和neg词典中的单词匹配。这可能导致情绪分数 - 因为句子中的可能单词等于1,而句子中的否定单词等于-1。
Product_ID Sentence Attribute SentimentScore
1111111 1 graphics 1
1111111 1 windows 1
1111111 2 loads -1
2222222 1 laptops -1
2222222 2 design 1
产品1111111的第一句似乎是:...这个产品...... 精美的图形 ... 在我的 windows上运行良好
EG。带有词义(pos.txt)的词典看起来像: A + 盛产 盛产 丰富 丰富 accessable 无障碍 欢呼 广受好评 ......等等
和带有否定词的字典(neg.txt)如下所示: 2面 2-面 不正常 废除 可恶 可恨 鄙弃 厌恶 退出 中止 中止 ......等等
我在gitHub看到了一个名为score.sentiment的函数,但它使用每个句子中pos和neg字之间差异来评估所有句子。我需要一些非常相似的东西,但需要单词。
我非常感谢你的任何帮助。非常感谢前进。
答案 0 :(得分:0)
这是否符合您的需求?
pos = c("abound" , "abounds", "abundant")
neg = c("2-face","abnormal")
sent = "abundant abnormal activity was due to 2-face people"
p = 0
for (i in 1:length(pos)) {
if (grepl(pos[i],sent,ignore.case=T) == TRUE) p = p + 1
}
n = 0
for (i in 1:length(neg)) {
if (grepl(neg[i],sent,ignore.case=T) == TRUE) n = n + 1
}
print(p)
print(n)
print(paste("Overall sentence sentiment score = ", p - n))
结果:正1,负2,整体-1
答案 1 :(得分:0)
蛮力逼近。不是最佳的,因为使用太多的循环,但似乎正在做你需要的。希望这应该适合您的应用程序。您可以重新排列内容或将结果存储在另一个变量中,以便输出为[1] [1]等。
代码:
sent = data.frame(Sentences=c("abundant bad abnormal activity was due to 2-face people","strange exciting activity was due to 2-face people"), user = c(1,2))
pos = c("abound" , "abounds", "abundant", "exciting")
neg = c("2-face","abnormal", "strange", "bad", "weird")
words <- matrix(ncol = 2,nrow=8)
words = (str_split(unlist(sent$Sentences)," "))
tmp <- data.frame()
tmn <- data.frame()
for (i in 1:nrow(sent)) {
for (j in 1:length(words)) {
for (k in 1:length(pos)){
if (words[[i]][j] == pos[k]) {
print(paste(i,words[[i]][j],1))
tmn <- cbind(i,words[[i]][j],1)
tmp <- rbind(tmp,tmn)
}
}
for (m in 1:length(neg)){
if (words[[i]][j] == neg[m]) {
print(paste(i,words[[i]][j],-1))
tmn <- cbind(i,words[[i]][j],-1)
tmp <- rbind(tmp,tmn)
}
}
}
}
View(tmp)
结果:
i V2 V3
1 1 abundant 1
2 1 bad -1
3 2 strange -1
4 2 exciting 1
答案 2 :(得分:0)
sent1 = data.frame(Sentences=c("abundant bad abnormal activity was due to 2- face people","strange exciting activity was due to great 2-face people"), user = c(1,2))
pos1 = c("abound" , "abounds", "abundant", "exciting", "great")
neg1 = c("2-face","abnormal", "strange", "bad", "weird")
然后我用了:
words = (str_split(unlist(sent1$Sentences)," "))
tmp <- data.frame()
tmn <- data.frame()
for (i in 1:nrow(sent1)) {
for (j in 1:length(words)) {
for (k in 1:length(pos1)){
if (words[[i]][j] == pos1[k]) {
print(paste(i,words[[i]][j],1))
tmn <- cbind(i,words[[i]][j],1)
tmp <- rbind(tmp,tmn)
}
}
for (m in 1:length(neg1)){
if (words[[i]][j] == neg1[m]) {
print(paste(i,words[[i]][j],-1))
tmn <- cbind(i,words[[i]][j],-1)
tmp <- rbind(tmp,tmn)
}
}
}
}
结果导致:
print(tmp)
i V2 V3
1 1 abundant 1
2 1 bad -1
3 2 strange -1
4 2 exciting 1
如果我这样做的话:
sent1$Sentences <- as.character(sent1$Sentences)
List <- strsplit(sent1$Sentences, " ")
a <- data.frame(Id=rep(sent1$user, sapply(List, length)), Words=unlist(List))
a$Words <- as.character(a$Words)
a[a$Words %in% pos1,]
导致了正确的:
Id Words
1 abundant
2 exciting
2 great
和否定: a [$%%in%neg1,]
Id Words
1 bad
1 abnormal
1 2-face
2 strange
2 2-face
但是我需要为正确值添加值1,为负面词添加-1。