使用RegExp替换数据框的选定列中的值

时间:2016-06-09 14:22:59

标签: regex r replace dataframe

假设我有一个数据框

mydata <- c("10 stack"," 10 stack and x" , "10 stack / dd" ," 10 stackxx")
R>mydata
[1] " 10 stack"
[2] " 10 stack and x" 
[3] " 10 stack  / dd"   
[4] " 10 stackxx"

我想要做的是替换并用10 堆栈[任意]开头的词汇到数据帧中的任何其他单词,但不删除其余的字符串 期望的输出。 也可以用和/或逗号替换反斜杠。

[1] " new"
[2] " new and x" 
[3] " new  and dd"   
[4] " new"

我的代码是

mydata[mydata =="10 stack" ] <- new # I can replace one type, but I need faster operation.
mydata[mydata =="///" ] <- and #for replacing backslash with and

我发现另一种方法可以解决问题

mydata<-as.data.frame(sapply(mydata,gsub,pattern="//\",replacement=","))

2 个答案:

答案 0 :(得分:3)

尝试

library(stringi) 
stri_replace_all_regex(mydata, c("10 stack", "\\/"), c("new", "and"), vectorize_all=FALSE)

给出了:

#[1] "new"        " new and x" "new and dd" " newxx"  

根据评论中@ rock321987的提及,如果您要替换10 stack[anything],您可以改为使用模式\\b10 stack[^\\s]*

stri_replace_all_regex(mydata, c("\\b10 stack[^\\s]*", "\\/"), c("new", "and"), 
                       vectorize_all=FALSE)

给出了:

#[1] "new"        " new and x" "new and dd" " new"  

答案 1 :(得分:2)

你需要使用sub()函数,它匹配pattern并用替换替换它。

sub("10 stack", " new", mydata)