如果一行中的任何变量大于某个值,请将该行保留在R

时间:2018-06-08 21:59:59

标签: r subset

所以,我有一个有效的代码:

minimum_ozone <- 65

ozone <- subset(ozone, ozone[2] >= minimum_ozone | ozone[3] >= minimum_ozone | ozone[4] >= minimum_ozone | ozone[5] >= minimum_ozone | ozone[6] >= minimum_ozone | ozone[7] >= minimum_ozone
             | ozone[8] >= minimum_ozone | ozone[9] >= minimum_ozone | ozone[10] >= minimum_ozone | ozone[11] >= minimum_ozone | ozone[12] >= minimum_ozone | ozone[13] >= minimum_ozone
             | ozone[14] >= minimum_ozone | ozone[15] >= minimum_ozone | ozone[16] >= minimum_ozone | ozone[17] >= minimum_ozone | ozone[18] >= minimum_ozone | ozone[19] >= minimum_ozone
             | ozone[20] >= minimum_ozone | ozone[21] >= minimum_ozone | ozone[22] >= minimum_ozone | ozone[23] >= minimum_ozone | ozone[24] >= minimum_ozone | ozone[25] >= minimum_ozone)

但是,这段代码看起来太笨了......是否有更短的代码/更快的执行方式?

2 个答案:

答案 0 :(得分:1)

set.seed(1)

ozone <- as.data.frame(matrix(sample(40:70, 50, replace=TRUE), 10))
ozone
#    V1 V2 V3 V4 V5
# 1  48 46 68 54 65
# 2  51 45 46 58 60
# 3  57 61 60 55 64
# 4  68 51 43 45 57
# 5  46 63 48 65 56
# 6  67 55 51 60 64
# 7  69 62 40 64 40
# 8  60 70 51 43 54
# 9  59 51 66 62 62
# 10 41 64 50 52 61

minimum_ozone <- 65
ozone[which(apply(ozone, 1, max) > minimum_ozone), ]
#   V1 V2 V3 V4 V5
# 1 48 46 68 54 65
# 4 68 51 43 45 57
# 6 67 55 51 60 64
# 7 69 62 40 64 40
# 8 60 70 51 43 54
# 9 59 51 66 62 62

答案 1 :(得分:1)

您应该能够以快速运行的矢量化方式执行此操作,而无需明确地键入每个比较。以下是一些选项:

# compare the whole data.frame to min and sum the logical values in each row
res1 <- ozone[rowSums(ozone[2:5] >= minimum_ozone) > 0,]

# use pmax to get the row maximum and then compare to min
res2 <- ozone[do.call(pmax, ozone[2:5]) >= minimum_ozone,]

# use Reduce and | (or) to do the same process you wrote out long-hand
res3 <- ozone[Reduce(`|`, lapply(ozone[2:5], `>=`, minimum_ozone)),]

使用一些补充数据对其进行测试:

# example data
minimum_ozone <- 65
set.seed(1)
ozone <- data.frame(replicate(5, sample(1:100,5)))
names(ozone) <- paste0("v",1:5)

# long-hand solution
out <- subset(
    ozone,
    ozone[2] >= minimum_ozone |
    ozone[3] >= minimum_ozone |
    ozone[4] >= minimum_ozone |
    ozone[5] >= minimum_ozone
)

identical(out, res1)
#[1] TRUE
identical(out, res2)
#[1] TRUE
identical(out, res3)
#[1] TRUE