Question

我有两个数据框：

     dput (df1)
     structure(list(ID = c(1, 2, 3, 4, 5), Age = c(29, 40, 60, 30, 
     39), BMI = c(27, 26, 31, 33, 26), A.1 = c(1L, 0L, 1L, 1L, 0L), 
     A.2 = c(0L, 0L, 0L, 0L, 0L), A.3 = c(0L, 1L, 0L, 0L, 1L), 
     A.4 = c(0L, 1L, 0L, 0L, 1L), B.1 = c(1L, 1L, 1L, 0L, 0L), 
     B.2 = c(1L, 1L, 1L, 1L, 1L), B.3 = c(1L, 0L, 1L, 0L, 0L), 
     B.4 = c(1L, 1L, 0L, 1L, 1L)), .Names = c("ID", "Age", "BMI", 
     "A.1", "A.2", "A.3", "A.4", "B.1", "B.2", "B.3", "B.4"), row.names = c(NA, 
     5L), class = "data.frame")

  df1
    ID Age BMI A.1 A.2 A.3 A.4 B.1 B.2 B.3 B.4
  1  1  29  27   1   0   0   0   1   1   1   1
  2  2  40  26   0   0   1   1   1   1   0   1
  3  3  60  31   1   0   0   0   1   1   1   0
  4  4  30  33   1   0   0   0   0   1   0   1
  5  5  39  26   0   0   1   1   0   1   0   1

    dput (df2)
   structure(list(ID = c(1, 2, 3, 4, 5), outcome = c(NA, NA, 2, 
   NA, NA)), .Names = c("ID", "outcome"), row.names = c(NA, -5L), class = "data.frame")


 df2
   ID outcome
  1  1      NA
  2  2      NA
  3  3       2
  4  4      NA
  5  5      NA

我的任务是根据df2的结果值更改df1中的数据。如果df2中的结果值为n，则具有＆gt; n后缀的df1列中的所有值应更改为相应ID的NA：

     df1
      ID Age BMI A.1 A.2 A.3 A.4 B.1 B.2 B.3 B.4
    1  1  29  27   1   0   0   0   1   1   1   1
    2  2  40  26   0   0   1   1   1   1   0   1
    3  3  60  31   1   0   NA  NA  1   1   NA  NA
    4  4  30  33   1   0   0   0   0   1   0   1
    5  5  39  26   0   0   1   1   0   1   0   1

请你帮我找一个解决方案，我真的很难受。

Answer 1

假设两个数据集按ID

排序

indx1 <- grep("\\.\\d+", colnames(df1))
Suf <- as.numeric(gsub('.*\\.', "", colnames(df1)[indx1]))
indx2 <- !is.na(df2$outcome)
df1[,indx1][indx2,Suf>df2$outcome[indx2]] <- NA
df1
#   ID Age BMI A.1 A.2 A.3 A.4 B.1 B.2 B.3 B.4
#1  1  29  27   1   0   0   0   1   1   1   1
#2  2  40  26   0   0   1   1   1   1   0   1
#3  3  60  31   1   0  NA  NA   1   1  NA  NA
#4  4  30  33   1   0   0   0   0   1   0   1
#5  5  39  26   0   0   1   1   0   1   0   1

或者

df1[indx1] <- Map(function(x,y,z) {x[y >z] <- NA; x } ,
                      df1[indx1], Suf, list(df2$outcome))

按行和列名称后缀更改数据框

1 个答案: