我有两个数据帧(df1和df2),我想用df2中的相应字符串替换df1中的部分字符串。
例如:结果应为df3
a <- c("extra text test-ID 1", "extra text test-ID 2", "extra text test-ID 3", "extra text test-ID 4")
b <- c("experiment 5","experiment 6","experiment 7","experiment 8")
c <- c("exercise 9","exercise 10","exercise 11","exercise 12")
df1 <- data.frame(a,b,c)
names(df1) <- c('a','b','c')
d <- c("test-ID 1", "test-ID 2", "test-ID 4")
e <- c("test-ID 1098", "test-ID 245", "test-ID 77")
df2 <- data.frame(d,e)
names(df2) <- c('a','b')
df1
df2
f <- c("extra text test-ID 1098", "extra text test-ID 245", "extra text test-ID 3", "extra text test-ID 77")
g <- c("experiment 5","experiment 6","experiment 7","experiment 8")
h <- c("exercise 9","exercise 10","exercise 11","exercise 12")
df3 <- data.frame(f,g,h)
names(df3) <- c('a','b','c')
df3
我想用一个函数来执行它。
replacefunction <- function(x) {
cat(paste("searching for ", x, "\n"))
for (i in seq_along(df2$a)) {
old <- df2$a[i]
new <- df2$b[i]
if (grepl(old, x)) {
cat(paste0('found ', '"', old, '"', "\n"))
return(gsub(old, new, x))
}
}
}
但是,这会给出警告:
df4 <- replace_values(df1$a)
Warning message:
In if (grepl(old, x)) { :
the condition has length > 1 and only the first element will be used
仅更改了df1 $ a列中的第一个条目,为什么会发生这种情况?
答案 0 :(得分:0)
这是一种基本的R方法,主要依靠apply
函数。我几乎像我的回答一样,除了我不得不诉诸于对for
最内层调用的显式sapply
循环。此循环遍历df2
模式/替换数据帧的所有行,并尝试在输入数据帧df1
的每个元素上进行替换。
d <- c("test-ID 1", "test-ID 2", "test-ID 4")
d <- paste0("\\b", d, "\\b")
e <- c("test-ID 1098", "test-ID 245", "test-ID 77")
df2 <- data.frame(d,e)
names(df2) <- c('a','b')
df1
df1[] <- lapply(df1, function(x) { # apply function(x) to each element of df1
sapply(x, function(y) {
for (i in 1:nrow(df2)) {
y <- gsub(df2[i, "a"], df2[i, "b"], y)
}
return(y)
})
})
df1
a b c
1 extra text test-ID 1 experiment 5 exercise 9
2 extra text test-ID 2 experiment 6 exercise 10
3 extra text test-ID 3 experiment 7 exercise 11
4 extra text test-ID 4 experiment 8 exercise 12
a b c
1 extra text test-ID 1098 experiment 5 exercise 9
2 extra text test-ID 245 experiment 6 exercise 10
3 extra text test-ID 3 experiment 7 exercise 11
4 extra text test-ID 77 experiment 8 exercise 12
我使用sub
来处理替换逻辑,并且由于这适用于正则表达式模式,因此我在目标模式周围加上了单词边界。也就是说,我匹配\btest-ID 1\b
而不是test-ID 1
,如果它作为其他文本中的子字符串出现,后者也会匹配该术语。