Question

这是我的数据框：

col_1 <- c(1,2,NA,4,5)
temp_col_1 <-c(12,2,2,3,4)
col_2 <- c(1,23,423,NA,23)
temp_col_2 <-c(1,2,23,4,5)

df_test<-data.frame(col_1,temp_col_1,col_2, temp_col_2)

在col_1列中，我想将NA替换为temp_col_1的相应值，并对col_2和temp_col_2

执行相同操作

我知道如何使用ifelse语句手动执行此操作，问题是我有大量带有模式col_name和temp_col_name的列，我想知道如何自动化它。< / p>

我尝试了df_test[,paste('temp_','col_1]＆＃39;之类的不同内容，但没有任何效果。有什么建议吗？

Answer 1

这应该提供你想要的东西。

lapply(names(df_test)[grepl("^temp_", names(df_test)],
       function(tc){
         col <- sub("^temp_", "", tc)
         row_to_replace <- which(is.na(df_test[[col]]))
         df_test[[col]][row_to_replace] <<- df_test[[tc]][row_to_replace]
       })

我很开心这个问题。我提出的最紧凑的方法是：

<<-

但它需要使用不那么受欢迎的[(r'\d\d?/\d\d?/\d\d\d\d', '00/00/0000'), ...]运算符。

Answer 2

# list of columns we need to check for NA's
col.to.check <- colnames(df_test)[!grepl("^temp", colnames(df_test))]
# these columns need not be checked
col.to.keep <- colnames(df_test)[grepl("^temp", colnames(df_test))]

func <- function(x){ 
  y <- which(is.na(df_test[[x]]))       # which position do NA's exist
  z <- df_test[[paste0("temp_", x)]][y] # which to be used to replace
  df_test[[x]][y] = z                   # replace them
  return(df_test[[x]])
  }

df = data.frame(lapply(col.to.check, func))
colnames(df) = col.to.check
cbind(df, df_test[col.to.keep])

#  col_1 col_2 temp_col_1 temp_col_2
#1     1     1         12          1
#2     2    23          2          2
#3     2   423          2         23
#4     4     4          3          4
#5     5    23          4          5

Answer 3

如果列以一致的顺序成对出现，就像在您的示例中一样，您可以尝试：

<强> 1A

ix <- which(is.na(df_test), arr.ind = TRUE)
ix2 <- ix
ix2[ , 2] <- ix2[ , 2] + 1
df_test[ix] <- df_test[ix2]

或者：

<强>图1b

d1 <- df_test[ , c(TRUE, FALSE)]
d2 <- df_test[ , c(FALSE, TRUE)]
d1[is.na(d1)] <- d2[is.na(d1)]
d1

如果您有成对的列，但不一定按照上面的顺序排列，请按照名称对列进行排序：

df <- data.frame(temp_col_2, col_1, col_2, temp_col_1)
df <- df[ , order(names(df))]

2a

ix <- which(is.na(df), arr.ind = TRUE)
ix2 <- ix
ix2[ , 2] <- ix2[ , 2] + ncol(df) / 2
df[ix] <- df[ix2]

或者：

2b

d1 <- df[ , 1:(ncol(df)/2)]
d2 <- df[ , (ncol(df)/2 + 1):ncol(df)]

然后按照 1b 进行操作。

一般来说，我倾向于认为以长格式处理数据更方便。这是使用data.table函数的可能性。

# melt data to long format
d <- melt(setDT(df_test), measure = patterns("^col", "^temp"), value.name = c("col", "temp"))

# update relevant values of 'col'
d[is.na(col), col := temp]

# if desired, cast back to wide format 
dcast(d, rowid(variable) ~ variable, value.var = c("col", "temp"))

根据名称中的模式更新列对

3 个答案: