这是一个后续问题 replace missing values with a value from another column已得到充分解决。我的问题是关于多个匹配列。
示例数据集:
s <- data.frame(ID=c(191, 282, 202, 210),
Group.1=c(NA, "A", NA, "B"),
Back.1=c("DD", "AA", "DD", "BB"),
Group.2=c("D","A", NA, "B"),
Back.2=c("DD", "BB", "CC", "AA"),
stringsAsFactors=FALSE)
ID Group.1 Back.1 Group.2 Back.2
1 191 <NA> DD D DD
2 282 A AA A BB
3 202 <NA> DD <NA> CC
4 210 B BB B AA
如果我想用匹配的'Back'列替换NA,我会使用:
s$Group.1 <- ifelse(test = !is.na(s$Group.1), yes = s$Group.1, no = s$Back.1)
s$Group.2 <- ifelse(test = !is.na(s$Group.2), yes = s$Group.2, no = s$Back.2)
s
ID Group.1 Back.1 Group.2 Back.2
1 191 DD DD D DD
2 282 A AA A BB
3 202 DD DD CC CC
4 210 B BB B AA
由Akrun发布,另一种方法是:
library(data.table)
setDT(s)[is.na(Group.1), Group.1:= Back.1]
setDT(s)[is.na(Group.2), Group.2:= Back.2]
因此,如果我有许多匹配的列,我希望能够映射,循环或应用或跨越它们。尝试循环函数会产生:
for (i in 1:2){
s[paste0("Group.", i)] <- ifelse(test = !is.na(s[paste0("Group.", i)]),
yes = s[paste0("Group.", i)],
no = s[paste0("Back.", i)])
}
Warning messages:
1: In `[<-.data.frame`(`*tmp*`, paste0("Group.", i), value = list(c("DD", :
provided 4 variables to replace 1 variables
2: In `[<-.data.frame`(`*tmp*`, paste0("Group.", i), value = list(c("D", :
provided 4 variables to replace 1 variables
> s
ID Group.1 Back.1 Group.2 Back.2
1 191 DD DD D DD
2 282 AA AA A BB
3 202 DD DD <NA> CC
4 210 BB BB B AA
哪个似乎适用于Group.1和Back.1但不适用于Group.2,从我的角度来看很难理解警告信息。
如果有人能用适当的循环来解决这个问题,那将非常感激。更有帮助的是能够推广到其他命名列,以便Back.x的数字匹配列也可以具有Back.x推测的缺失值。即。
s <- data.frame(ID=c(191, 282, 202, 210),
Group.1=c(NA, "A", NA, "B"),
Back.1=c("DD", "AA", "DD", "BB"),
Group.2=c("D","A", NA, "B"),
Back.2=c("DD", "BB", "CC", "AA"),
Donk.1 =c("PP", "ZZ", NA, "QQ"),
stringsAsFactors=FALSE)
答案 0 :(得分:1)
我们可以使用
gr1 <- grep("Group", names(s), value = TRUE)
bc1 <- grep("Back", names(s), value = TRUE)
setDT(s)
for(j in seq_along(gr1)){
s[is.na(get(gr1[j])), (gr1[j]) := get(bc1[j])]
}
s
# ID Group.1 Back.1 Group.2 Back.2
#1: 191 DD DD D DD
#2: 282 A AA A BB
#3: 202 DD DD CC CC
#4: 210 B BB B AA
对于更新的数据集
gr1 <- names(s)[seq(2, ncol(s), by = 2)]
bc1 <- names(s)[seq(3, ncol(s), by = 2)]
setDT(s)
for(j in seq_along(gr1)){
s[is.na(get(gr1[j])), (gr1[j]) := get(bc1[j])][]
}
s
# ID Group.1 Back.1 Group.2 Back.2 Donk.1 Back.1.1
#1: 191 DD DD D DD PP DD
#2: 282 A AA A BB ZZ AA
#3: 202 DD DD CC CC DD DD
#4: 210 B BB B AA QQ BB
s <- data.frame(ID=c(191, 282, 202, 210),
Group.1=c(NA, "A", NA, "B"),
Back.1=c("DD", "AA", "DD", "BB"),
Group.2=c("D","A", NA, "B"),
Back.2=c("DD", "BB", "CC", "AA"),
Donk.1 =c("PP", "ZZ", NA, "QQ"),
Back.1=c("DD", "AA", "DD", "BB"),
stringsAsFactors=FALSE)