如何将标记添加到否定的单词直到R中的下一个标点符号

时间:2014-08-25 22:46:15

标签: r sentiment-analysis negation

我需要一些帮助来弄清楚我们如何在R中模拟为“否定词”之后的每个单词添加标签“NOT_”直到下一个标点符号的解决方案。

可以在How to add tags to negated words in strings that follow "not", "no" and "never"找到Python代码的解决方案。

我有以下解决方案,用于将标记“NOT_”添加到否定词之后的下一个词:not,never,not,without,could not

str_negate <- function(x) {
  gsub("not ","not NOT_",
            gsub("n't ","n't NOT_",
            gsub("never ","never NOT_",
            gsub("without ","without NOT_",
            gsub("unlikely to ","unlikely to NOT_",x)))))
}

str_negate(FeedbackCommentsVectorProc$Sentences)

但我需要对其进行调整,以便在每个单词中添加标记“NOT_”,直到下一个标点符号。

非常感谢任何帮助!

1 个答案:

答案 0 :(得分:2)

修改

在尝试解决这个问题后,这是我能够提出的最简单的解决方案。 注意:如果字符串在标点符号前面有多个否定字,则会失败。

library(gsubfn)
str_negate <- function(x) {
   x1 <- gsub("(not|n't|never|without|unlikely to) (\\w+)", '\\1 NOT_\\2', x)
   x2 <- gsubfn('NOT_([^[:punct:]]+)', ~ gsub('(\\w+)', 'NOT_\\1', x), x1)
   x2
}
x <- "It was never going to work, he thought. He did not play so well, so he had to practice some more."
str_negate(x)
## [1] "It was never NOT_going NOT_to NOT_work, he thought. He did not NOT_play NOT_so NOT_well, so he had to practice some more."

如果在标点符号之前有多个否定词,那就是这种情况....

str_negate <- function(x) {
   x1 <- gsub("(not|n't|never|without|unlikely to) \\K", 'NOT_', x, perl=T)
   x2 <- gsubfn('NOT_([a-zA-Z_ ]+)', ~ gsub("\\b(?!(?i:not|n't|never|without|unlikely to))(?=\\w+)", 'NOT_', x, perl=TRUE), x1)
   x2
}
x <- 'It was unlikely to work and it seems like it never was going to end.'
str_negate(x)
## [1] "It was unlikely to NOT_work NOT_and NOT_it NOT_seems NOT_like NOT_it never NOT_was NOT_going NOT_to NOT_end."