获得最高和最低x%并在R中分配数字

时间:2018-12-16 14:20:14

标签: r

我正在尝试获取观察值的前10%和后10%所在的值。在绘制密度时,我想确定底部10%和顶部10%的观测值-然后我想为底部10%以下的所有观测值分配“ -1”,为底部10%以下的所有观测值分配“ +1”所有观察结果都排在前10%,其他所有结果都为“ 0”。

plot(density(as.numeric(test$pr.lm)))

这段代码可以满足我的要求,但是仅当观测值小于tan 0.5且大于tan 0.5时,我才想将其转换为“百分比”。

test$pred.lm <- ifelse(test$pr.lm < 0.5, "-1",
                       ifelse(test$pr.lm > 0.5, "1", "0"))

数据:

pr.lm <- c(`2018-10-03` = 0.423462496856153, `2018-10-04` = 0.427913898978011, 
`2018-10-05` = 0.404934696139578, `2018-10-08` = 0.446317322918278, 
`2018-10-09` = 0.497167887579, `2018-10-10` = 0.483608339493601, 
`2018-10-11` = 0.506296752048131, `2018-10-12` = 0.620769108097577, 
`2018-10-15` = 0.641401086662484, `2018-10-16` = 0.647211253697089, 
`2018-10-17` = 0.624948223534579, `2018-10-18` = 0.706720641849297, 
`2018-10-19` = 0.678927972325959, `2018-10-22` = 0.594686934902609, 
`2018-10-23` = 0.586573168581061, `2018-10-24` = 0.481744214817579, 
`2018-10-25` = 0.501879874108935, `2018-10-26` = 0.638941662227341, 
`2018-10-29` = 0.533530225556122, `2018-10-30` = 0.520026314139557, 
`2018-10-31` = 0.55841571603097, `2018-11-01` = 0.681757510274823, 
`2018-11-02` = 0.59654572803471, `2018-11-05` = 0.626287514663055, 
`2018-11-06` = 0.714443802319515, `2018-11-07` = 0.67080600584018, 
`2018-11-08` = 0.59281752403647, `2018-11-09` = 0.563390754546873, 
`2018-11-12` = 0.518030212097214, `2018-11-13` = 0.669092984178484, 
`2018-11-14` = 0.637525191976898, `2018-11-15` = 0.49706914674227, 
`2018-11-16` = 0.541251316928707, `2018-11-19` = 0.604244652770604, 
`2018-11-20` = 0.684593935690332, `2018-11-21` = 0.720398651972747, 
`2018-11-23` = 0.653974914464049, `2018-11-26` = 0.574402370856118, 
`2018-11-27` = 0.614833371923479, `2018-11-28` = 0.715942039198248, 
`2018-11-29` = 0.711536503476983, `2018-11-30` = 0.621089259799182
)

1 个答案:

答案 0 :(得分:3)

听起来像您在寻找quantile()的声音,也许是w的组合。 cut()

cut(x, c(-Inf, quantile(x, c(0.1, 0.9)), Inf))

例如在您的情况下(如Ben Bolker指出的那样,向cut添加标签参数):

cuts <- c(-Inf, quantile(test$pr.lm, c(0.1, 0.9)), Inf)
test$pred.lm <- cut(test$pr.lm, cuts, labels = c(-1, 0, 1))
# and if we want to keep it as integer:
test$pred.lm <- as.integer(as.character(test$pred.lm))

一个小例子:

x <- rnorm(100)
qs <- quantile(x, c(0.1, 0.9))
bins <- cut(x, c(-Inf, qs, Inf))

输出:

> qs
      10%       90% 
-1.418241  1.278333 
> head(bins)
[1] (-1.42,1.28] (-1.42,1.28] (-1.42,1.28]
[4] (1.28, Inf]  (1.28, Inf]  (-Inf,-1.42]
3 Levels: (-Inf,-1.42] ... (1.28, Inf]
> table(bins)
bins
(-Inf,-1.42] (-1.42,1.28]  (1.28, Inf] 
          10           80           10