我正在对财务文章进行情绪分析。为了提高朴素贝叶斯分类器的准确性,我想实现否定处理。
具体来说,我想在“not”或“not”之后的单词中添加前缀“not_”
所以如果在我的语料库中有这样的东西:
x <- "They didn't sell the company."
我想得到以下内容:
"they didn't not_sell the company."
(禁止词“没有”将在以后删除)
我只能找到gsub()
函数,但它似乎不能用于此任务。
任何帮助将不胜感激!谢谢!
答案 0 :(得分:1)
具体来说,我想在a之后的单词中添加前缀“not_” “不”或“不”
str_negate <- function(x) {
gsub("not ","not not_",gsub("n't ","n't not_",x))
}
或者我想你可以使用strsplit:
str_negate <- function(x) {
str_split <- unlist(strsplit(x=x, split=" "))
is_negative <- grepl("not|n't",str_split,ignore.case=T)
negate_me <- append(FALSE,is_negative)[1:length(str_split)]
str_split[negate_me==T]<- paste0("not_",str_split[negate_me==T])
paste(str_split,collapse=" ")
}
无论哪种方式都可以:
> str_negate("They didn't sell the company")
[1] "They didn't not_sell the company"
> str_negate("They did not sell the company")
[1] "They did not not_sell the company"