在文本挖掘中,我要在某些单词之后清除文本(清除电子邮件签名)直到结束。为此,我正在使用stringr::str_locate
with查找给定文本的位置。当我一一传递文本时,它正在工作。但是,当我一次使用循环传递时,并没有采取。
以下是我的脚本:
library(stringr)
txt <- c("Hello.\r\ncorrections have been done now.\r\nCheers, Peik Niemi\r\ncheers, Peik\r\n\r\nBest Regards,\r\nAngelo Javier\r\n------------------- Original Message -------------------\r")
salt <- c("NOTICE:", "Many thanks", "Sincerely", "With gratitude", "rgds", "tks", "cheers","tc", "disclaimer", "kind regards","best regards","thanks and regards","Sent from my","Outlook for Android","[\n\r].*--","warm regards","thanks & regards","regards","\\*\\*")
names(salt) <- salt[]
Salute <- function(txt){
for(i in salt[,]){
txt1 <- tolower(txt)
assign(salt1, names(salt[i]))
# salt1 = salt[i]
dis_loc = as.data.frame(str_locate(as.character(txt1, pattern=fixed(salt1))))[1,1]
}
if(is.na(dis_loc)){ct = txt}
if(is.na(dis_loc)==F){ct = (substr(txt,1, (dis_loc-1)))}
substr(txt,1, (dis_loc-1))
ct <- as.data.table(ct)
return(ct)
}
txtClean <- lapply(txt,Salute)
错误: 类型(模式)错误:缺少参数“模式”,没有默认值
预期输出: “亲爱的默里, 乐队开会的时间”
请帮助我以正确的方式在Str_locate中传递列表。 预先感谢!
答案 0 :(得分:1)
以下代码删除了其中一个关键字(包括该关键字)之后出现的所有内容:
str_replace(txt,paste0("(?i)(",paste(salt,collapse="|"),")(?s).*"),"")
#[1] "Dear Murray,\nTime for a band meeting\n"
txt <- "Hello.\r\ncorrections have been done now.\r\nCheers, Peik Niemi\r\ncheers, Peik\r\n\r\nBest Regards,\r\nAngelo Javier\r\n------------------- Original Message -------------------\r"
str_replace(txt,paste0("(?i)(",paste(salt,collapse="|"),")(?s).*"),"")
#[1] "Hello.\r\ncorrections have been done now.\r\n"