R - 使用字符串数据帧顺序替换字符串

时间:2016-12-29 12:01:51

标签: r regex

我正在尝试构建一个函数F,用于替换stings'df'数据框中的目标字符串'str', 逐列,逐行,根据列名作为要替换的子字符串, 和列值作为替换。 result是替换字符串的字符串向量长度'rownum' 将'colnum'替换为每个字符串作为输出。

一个例子可以说明最佳:

str <- "Hi, I am name and I am age years old! - said name "

df <- data.frame(name = c('John', 'Richard','Edward'), age =c('10','26','12'))

F(str,df)

"Hi, I am John and I am 10 years old! - said John "

"Hi, I am Richard and I am 26 years old! - said Richard "

"Hi, I am Edward and I am 12 years old! - said Edward "

我为这份工作写了一个函数:

F <- function(str,df)
{
  x <- str
  for(i in names(df)){
    x <- unname(mapply(gsub,i,df[[i]],x))
  }
  return(x)
}
它似乎有效,但我的印象是它不高效也不优雅。

  1. 有没有办法避免循环?
  2. 是必要的吗?
  3. 可以在'str'是多行文本时使用,而不仅仅是a 单行?
  4. 感谢您的帮助

4 个答案:

答案 0 :(得分:2)

也许是另一个“隐藏”for循环的选项:

library(stringi)
f <- function(str, df) 
  apply(df, 1, stri_replace_all, str=str, fixed=names(df), merge=T, vec=F)  
f("Hi, I am name and I am age years old! - said name ", df)
# [1] "Hi, I am John and I am 10 years old! - said John "      
# [2] "Hi, I am Richard and I am 26 years old! - said Richard "
# [3] "Hi, I am Edward and I am 12 years old! - said Edward "

str <- "Hi, I am name and I am age years old! - said name\n
Hi, I am name and I am age years old! - said name"
f(str, df)
# [1] "Hi, I am John and I am 10 years old! - said John\n\nHi, I am John and I am 10 years old! - said John"            
# [2] "Hi, I am Richard and I am 26 years old! - said Richard\n\nHi, I am Richard and I am 26 years old! - said Richard"
# [3] "Hi, I am Edward and I am 12 years old! - said Edward\n\nHi, I am Edward and I am 12 years old! - said Edward"

答案 1 :(得分:1)

Mustache是通过模板进行此类字符串操作的绝佳解决方案。对于简单的字符串/模板,我也会使用sprintf。对于更复杂的模板,我肯定会使用Mustache。

Mustache的R实现是whisker - 包

在你的情况下,可以这样做,例如通过:

#install.packages("whisker")
library(whisker)
template <- 
"Hi, I am {{name}} and I am {{age}} years old! - 
said {{name}}"

df <- data.frame(name = c('John', 'Richard','Edward'), age =c('10','26','12'))

out <- apply(df, 1, function(x) whisker.render(template, x))

给你:

[1] "Hi, I am John and I am 10 years old! -\nsaid John"      
[2] "Hi, I am Richard and I am 26 years old! -\nsaid Richard"
[3] "Hi, I am Edward and I am 12 years old! -\nsaid Edward" 

存在换行符(\n)是输出。

您还可以使用readLines初步阅读模板,而不是在代码中对其进行硬编码。

答案 2 :(得分:1)

最简单的方法(由评论中的@RomanLustrik提供):

str <- "Hi, I am %s and I am %s years old! - said %s "
sprintf(str, df$name, df$age, df$name)

结果:

[1] "Hi, I am John and I am 10 years old! - said John "      
[2] "Hi, I am Richard and I am 26 years old! - said Richard "
[3] "Hi, I am Edward and I am 12 years old! - said Edward "  

答案 3 :(得分:0)

我们可以通过编程方式完成此任务(灵感来自@ RomanLustrik的想法

do.call(sprintf, c(cbind(df, name2=df$name), fmt = gsub("name|age", "%s", str)))
#[1] "Hi, I am John and I am 10 years old! - said John "    
#[2] "Hi, I am Richard and I am 26 years old! - said Richard "
#[3] "Hi, I am Edward and I am 12 years old! - said Edward "