如何从句子中提取单个单词并将其与来自R和pos中的单词的单词匹配

时间:2015-01-21 16:40:38

标签: r

我需要在R语言中创建一个函数,它可以将句子切换成单词,然后这些单词与pos和neg词典中的单词匹配。这可能导致情绪分数 - 因为句子中的可能单词等于1,而句子中的否定单词等于-1。

Product_ID        Sentence        Attribute        SentimentScore
1111111              1            graphics                1
1111111              1            windows                 1
1111111              2            loads                  -1
2222222              1            laptops                -1
2222222              2            design                  1

产品1111111的第一句似乎是:...这个产品...... 精美的图形 ... 在我的 windows上运行良好

EG。带有词义(pos.txt)的词典看起来像: A + 盛产 盛产 丰富 丰富 accessable 无障碍 欢呼 广受好评 ......等等

和带有否定词的字典(neg.txt)如下所示: 2面 2-面 不正常 废除 可恶 可恨 鄙弃 厌恶 退出 中止 中止 ......等等

我在gitHub看到了一个名为score.sentiment的函数,但它使用每个句子中pos和neg字之间差异来评估所有句子。我需要一些非常相似的东西,但需要单词。

我非常感谢你的任何帮助。非常感谢前进。

3 个答案:

答案 0 :(得分:0)

这是否符合您的需求?

pos = c("abound" , "abounds", "abundant")
neg = c("2-face","abnormal")

sent = "abundant abnormal activity was due to 2-face people"

p = 0
for (i in 1:length(pos)) {
  if (grepl(pos[i],sent,ignore.case=T) == TRUE) p = p + 1  
}

n = 0
for (i in 1:length(neg)) {
  if (grepl(neg[i],sent,ignore.case=T) == TRUE) n = n + 1  
}

print(p)
print(n)
print(paste("Overall sentence sentiment score = ", p - n))

结果:正1,负2,整体-1

答案 1 :(得分:0)

蛮力逼近。不是最佳的,因为使用太多的循环,但似乎正在做你需要的。希望这应该适合您的应用程序。您可以重新排列内容或将结果存储在另一个变量中,以便输出为[1] [1]等。

代码:

sent = data.frame(Sentences=c("abundant bad abnormal activity was due to 2-face people","strange exciting activity was due to 2-face people"), user = c(1,2)) 
pos = c("abound" , "abounds", "abundant", "exciting")
neg = c("2-face","abnormal", "strange", "bad", "weird")

words <- matrix(ncol = 2,nrow=8)

words = (str_split(unlist(sent$Sentences)," "))

tmp <- data.frame()
tmn <- data.frame()

for (i in 1:nrow(sent)) {
  for (j in 1:length(words)) {
    for (k in 1:length(pos)){
      if (words[[i]][j] == pos[k]) {
        print(paste(i,words[[i]][j],1))
        tmn <- cbind(i,words[[i]][j],1)
        tmp <- rbind(tmp,tmn)
      }
    }
    for (m in 1:length(neg)){
      if (words[[i]][j] == neg[m]) { 
        print(paste(i,words[[i]][j],-1))
        tmn <- cbind(i,words[[i]][j],-1)
        tmp <- rbind(tmp,tmn)
      }
    }  
  }
}

View(tmp)

结果:

    i   V2         V3
1   1   abundant    1
2   1   bad        -1
3   2   strange    -1
4   2   exciting    1

答案 2 :(得分:0)

sent1 = data.frame(Sentences=c("abundant bad abnormal activity was due to 2- face people","strange exciting activity was due to great 2-face people"), user = c(1,2)) 
pos1 = c("abound" , "abounds", "abundant", "exciting", "great")
neg1 = c("2-face","abnormal", "strange", "bad", "weird")

然后我用了:

words = (str_split(unlist(sent1$Sentences)," "))

tmp <- data.frame()
tmn <- data.frame()

for (i in 1:nrow(sent1)) {
   for (j in 1:length(words)) {
    for (k in 1:length(pos1)){
     if (words[[i]][j] == pos1[k]) {
    print(paste(i,words[[i]][j],1))
    tmn <- cbind(i,words[[i]][j],1)
    tmp <- rbind(tmp,tmn)
  }
}
for (m in 1:length(neg1)){
  if (words[[i]][j] == neg1[m]) { 
    print(paste(i,words[[i]][j],-1))
    tmn <- cbind(i,words[[i]][j],-1)
    tmp <- rbind(tmp,tmn)
      }
    }  
  }
 }

结果导致:

print(tmp)
  i       V2 V3
1 1 abundant  1
2 1      bad -1
3 2  strange -1
4 2 exciting  1

如果我这样做的话:

sent1$Sentences <- as.character(sent1$Sentences)
List <- strsplit(sent1$Sentences, " ")
a <- data.frame(Id=rep(sent1$user, sapply(List, length)),    Words=unlist(List))
a$Words <- as.character(a$Words)
a[a$Words %in% pos1,]

导致了正确的:

Id    Words
1 abundant
2 exciting
2    great

和否定:     a [$%%in%neg1,]

Id    Words
1      bad
1 abnormal
1   2-face
2  strange
2   2-face

但是我需要为正确值添加值1,为负面词添加-1。