我需要一些帮助来弄清楚我们如何在R中模拟为“否定词”之后的每个单词添加标签“NOT_”直到下一个标点符号的解决方案。
可以在How to add tags to negated words in strings that follow "not", "no" and "never"找到Python代码的解决方案。
我有以下解决方案,用于将标记“NOT_”添加到否定词之后的下一个词:not,never,not,without,could not
str_negate <- function(x) {
gsub("not ","not NOT_",
gsub("n't ","n't NOT_",
gsub("never ","never NOT_",
gsub("without ","without NOT_",
gsub("unlikely to ","unlikely to NOT_",x)))))
}
str_negate(FeedbackCommentsVectorProc$Sentences)
但我需要对其进行调整,以便在每个单词中添加标记“NOT_”,直到下一个标点符号。
非常感谢任何帮助!
答案 0 :(得分:2)
在尝试解决这个问题后,这是我能够提出的最简单的解决方案。 注意:如果字符串在标点符号前面有多个否定字,则会失败。
library(gsubfn)
str_negate <- function(x) {
x1 <- gsub("(not|n't|never|without|unlikely to) (\\w+)", '\\1 NOT_\\2', x)
x2 <- gsubfn('NOT_([^[:punct:]]+)', ~ gsub('(\\w+)', 'NOT_\\1', x), x1)
x2
}
x <- "It was never going to work, he thought. He did not play so well, so he had to practice some more."
str_negate(x)
## [1] "It was never NOT_going NOT_to NOT_work, he thought. He did not NOT_play NOT_so NOT_well, so he had to practice some more."
如果在标点符号之前有多个否定词,那就是这种情况....
str_negate <- function(x) {
x1 <- gsub("(not|n't|never|without|unlikely to) \\K", 'NOT_', x, perl=T)
x2 <- gsubfn('NOT_([a-zA-Z_ ]+)', ~ gsub("\\b(?!(?i:not|n't|never|without|unlikely to))(?=\\w+)", 'NOT_', x, perl=TRUE), x1)
x2
}
x <- 'It was unlikely to work and it seems like it never was going to end.'
str_negate(x)
## [1] "It was unlikely to NOT_work NOT_and NOT_it NOT_seems NOT_like NOT_it never NOT_was NOT_going NOT_to NOT_end."