Question

我有一个如下所示的数据集：

Col1    Col2    Col3     Col4   Last_Col1    Last_Col2   Last_Col3    Last_Col4
  NA       1       4        7           9           10          11           12
  NA      NA       4       NA          NA            9          NA           10
   8      NA       9       10          11           12          20           49
   9       7      NA       NA          34            2           3           50

如何用名称中带有Last_的相应列替换名称开头不带Last_的NA值？然后，我要删除名称中具有Last_的列。

最终预期输出：

Col1    Col2    Col3     Col4   
   9       1       4        7    
  NA       9       4       10    
   8      12       9       10    
   9       7       3       50

任何帮助都会很棒，谢谢！

Answer 1

因为它是对应的列，所以我们将要替换“ NA”的列（“ nm1”）和以“最后”作为前缀（“ nm2”）的列作为子集，在第一组上创建逻辑矩阵列（'i1'）中的值，使用它来分配与NA

相对应的第二组值

nm1 <- names(df1)[1:4]
nm2 <- names(df1)[5:8]

或使用

nm1 <- names(df1)[startsWith(names(df1), "Col")]
nm2 <- names(df1)[startsWith(names(df1), "Last_")]
i1 <- is.na(df1[nm1])
df1[nm1][i1] <- df1[nm2][i1] 
newdf <- df1[nm1]
newdf
#  Col1 Col2 Col3 Col4
#1    9    1    4    7
#2   NA    9    4   10
#3    8   12    9   10
#4    9    7    3   50

数据

df1 <- structure(list(Col1 = c(NA, NA, 8L, 9L), Col2 = c(1L, NA, NA, 
7L), Col3 = c(4L, 4L, 9L, NA), Col4 = c(7L, NA, 10L, NA), Last_Col1 = c(9L, 
 NA, 11L, 34L), Last_Col2 = c(10L, 9L, 12L, 2L), Last_Col3 = c(11L, 
 NA, 20L, 3L), Last_Col4 = c(12L, 10L, 49L, 50L)), 
 class = "data.frame", row.names = c(NA, -4L))

Answer 2

将数据分成两个data.frame，在指定的Col_X中查找缺失的数据，并用以Last_开头的data.frame中的值覆盖它们。

xy <- read.table(text = "Col1    Col2    Col3     Col4   Last_Col1    Last_Col2   Last_Col3    Last_Col4
  NA       1       4        7           9           10          11           12
  NA      NA       4       NA          NA            9          NA           10
   8      NA       9       10          11           12          20           49
   9       7      NA       NA          34            2           3           50", header = TRUE)

xy1 <- xy[, grepl("^Col\\d+$", names(xy))]
xy2 <- xy[, grepl("^Last_Col\\d+$", names(xy))]

xy1[is.na(xy1)] <- xy2[is.na(xy1)]

> xy1
  Col1 Col2 Col3 Col4
1    9    1    4    7
2   NA    9    4   10
3    8   12    9   10
4    9    7    3   50

Answer 3

一种data.table解决方案（在更大的数据集上将会非常快）：

ourcols <- paste0("Col", 1:4)
for (col in ourcols) {
  rows = which(is.na(dt[[col]]))
  set(x = dt, i = rows, j = col, value = dt[rows, get(paste0("Last_", col))])
}
dt[, ..ourcols]
   Col1 Col2 Col3 Col4
1:    9    1    4    7
2:   NA    9    4   10
3:    8   12    9   10
4:    9    7    3   50

将NA替换为具有相应列名称的另一列的值

3 个答案:

数据