R-用for循环和gsub替换字符串的一部分

时间:2019-02-28 13:41:06

标签: r string loops replace gsub

我有两个数据帧(df1和df2),我想用df2中的相应字符串替换df1中的部分字符串。

例如:结果应为df3

a <- c("extra text test-ID 1", "extra text test-ID 2", "extra text test-ID 3", "extra text test-ID 4")
b <- c("experiment 5","experiment 6","experiment 7","experiment 8") 
c <- c("exercise 9","exercise 10","exercise 11","exercise 12")

df1 <- data.frame(a,b,c)
names(df1) <- c('a','b','c')

d <- c("test-ID 1", "test-ID 2", "test-ID 4")
e <- c("test-ID 1098", "test-ID 245", "test-ID 77")

df2 <- data.frame(d,e)
names(df2) <- c('a','b')

df1
df2

f <- c("extra text test-ID 1098", "extra text test-ID 245", "extra text test-ID 3", "extra text test-ID 77")
g <- c("experiment 5","experiment 6","experiment 7","experiment 8") 
h <- c("exercise 9","exercise 10","exercise 11","exercise 12")

df3 <- data.frame(f,g,h)
names(df3) <- c('a','b','c')
df3

我想用一个函数来执行它。

replacefunction <- function(x) {
  cat(paste("searching for ", x, "\n"))
  for (i in seq_along(df2$a)) {
    old <- df2$a[i]
    new <- df2$b[i]
    if (grepl(old, x)) { 
      cat(paste0('found ', '"', old, '"', "\n"))
      return(gsub(old, new, x))
    }
  }
}

但是,这会给出警告:

df4 <- replace_values(df1$a)

Warning message:
In if (grepl(old, x)) { :
  the condition has length > 1 and only the first element will be used

仅更改了df1 $ a列中的第一个条目,为什么会发生这种情况?

1 个答案:

答案 0 :(得分:0)

这是一种基本的R方法,主要依靠apply函数。我几乎像我的回答一样,除了我不得不诉诸于对for最内层调用的显式sapply循环。此循环遍历df2模式/替换数据帧的所有行,并尝试在输入数据帧df1的每个元素上进行替换。

d <- c("test-ID 1", "test-ID 2", "test-ID 4")
d <- paste0("\\b", d, "\\b")
e <- c("test-ID 1098", "test-ID 245", "test-ID 77")

df2 <- data.frame(d,e)
names(df2) <- c('a','b')

df1
df1[] <- lapply(df1, function(x) {    # apply function(x) to each element of df1
    sapply(x, function(y) {
        for (i in 1:nrow(df2)) {
            y <- gsub(df2[i, "a"], df2[i, "b"], y)
        }
        return(y)
    })
})

df1

                     a            b           c
1 extra text test-ID 1 experiment 5  exercise 9
2 extra text test-ID 2 experiment 6 exercise 10
3 extra text test-ID 3 experiment 7 exercise 11
4 extra text test-ID 4 experiment 8 exercise 12
                        a            b           c
1 extra text test-ID 1098 experiment 5  exercise 9
2  extra text test-ID 245 experiment 6 exercise 10
3    extra text test-ID 3 experiment 7 exercise 11
4   extra text test-ID 77 experiment 8 exercise 12

我使用sub来处理替换逻辑,并且由于这适用于正则表达式模式,因此我在目标模式周围加上了单词边界。也就是说,我匹配\btest-ID 1\b而不是test-ID 1,如果它作为其他文本中的子字符串出现,后者也会匹配该术语。