在R中按组比较一个变量与其他变量

时间:2019-03-04 15:16:34

标签: r dataframe compare

我有以下数据框:

data.frame(id = c("a", "a", "a", "d", "d"),
           value = c(5, 46, 12, 14, 32),
           low = c(46, 8, NA, 0, 34),
           high = c(56, 20, NA, 12, 60))

  id value low high
1  a     5  46   56
2  a    46   8   20
3  a    12  NA   NA
4  d    14   0   12
5  d    32  34   60

如果TRUE超出了valuelow定义的每个具有相同{{1}的行的间隔,则需要将新变量设置为high }。

我想要的数据框是:

id

如何在基准R中做到这一点?我在一个只能访问基本R的限制性环境中工作。

3 个答案:

答案 0 :(得分:1)

没有applysapplymap功能:

isInDataframe <- function(data = data, value = "value", from = "low", to = "high", id = "id"){
    result <- c()
    for (i in 1:length(data[,1])) {
      deeta <- data[data[id] == as.character(data[id][i,1]),]
      subresult <- c()
      for (j in 1:nrow(deeta)) {
        subresult[j] <- (data[value][i,1] >= deeta[from][j,1] & data[value][i,1] <= deeta[to][j,1])
  }
      result[i] <- !any(subresult,na.rm = T) 
    }
    data$result <- result
    return(data)
}

isInDataframe(data = data, value = "value", from = "low", to = "high", id = "id")
id value low high result
1  a     5  46   56   TRUE
2  a    46   8   20  FALSE
3  a    12  NA   NA  FALSE
4  d    14   0   12   TRUE
5  d    32  34   60   TRUE

答案 1 :(得分:0)

我想出了一个丑陋且未优化的解决方案,但它可行!这是代码:

df <- data.frame(id = c("a", "a", "a", "d", "d"),
       value = c(5, 46, 12, 14, 32),
       low = c(46, 8, NA, 0, 34),
       high = c(56, 20, NA, 12, 60))

list.inter <- list()

for(i in 1:nrow(df)){
 if(is.na(df$low[i]) | is.na(df$low[i])) {
   list.inter[[i]] <- NA
 }else{
   list.inter[[i]] <- seq(from = df$low[i], to = df$high[i])
 }
}

result <- c()
for(i in 1:nrow(df)){
  result[i] <-  ! df$value[i] %in% unlist(list.inter[which(df$id[i]==df$id)])
}

df$result <- result

我希望这会有所帮助,并且很好奇看到其他用户提供的一些优化代码!

答案 2 :(得分:0)

为了进行此分析,我最终选择将一个数据帧中的idvalue以及另一个数据帧中的idlowhigh分开。 / p>

但是,这是一个受the solutions suggested for this new approach启发的解决方案:

df <- data.frame(id = c("a", "a", "a", "d", "d"),
                 value = c(5, 46, 12, 14, 32),
                 low = c(46, 8, NA, 0, 34),
                 high = c(56, 20, NA, 12, 60))

temp <- merge(x = df[c("id",
                       "value")],
              y = df[c("id",
                       "low",
                       "high")])

temp$result <- temp$value < temp$low | temp$value > temp$high

merge(x = df,
      y = aggregate(formula = result ~ id + value,
                    data = temp,
                    FUN = all))

  id value low high result
1  a    12  NA   NA  FALSE
2  a    46   8   20  FALSE
3  a     5  46   56   TRUE
4  d    14   0   12   TRUE
5  d    32  34   60   TRUE