我有这个数据集:
dput(head(data,20))
structure(list(Date = structure(c(1495722600, 1495723500, 1495724400,
1495725300, 1495726200, 1495727100, 1495728000, 1495728900, 1495729800,
1495730700, 1495731600, 1495732500, 1495733400, 1495734300, 1495735200,
1495736100, 1495737000, 1495737900, 1495738800, 1495739700), class = c("POSIXct",
"POSIXt"), tzone = ""), JVM_CPU = c(1.07500004768372, 1.75, 10.6979999542236,
2.40000009536743, 2.42400002479553, 5.80000019073486, 6.80000019073486,
1.85000002384186, 8.52499961853027, 0.800000011920929, 12.7740001678467,
0.174999997019768, 0.499000012874603, 0.248999997973442, 6.82499980926514,
1.125, 0.949000000953674, 0.874000012874603, 6.55000019073486,
0.248999997973442)), .Names = c("Date", "JVM_CPU"), row.names = c(NA,
-20L), class = c("tbl_df", "tbl", "data.frame"))
我需要对其进行子集设置,以便它没有异常值:
我可以这样做以从中删除异常值:data $ JVM_CPU:
data_cpu$JVM_CPU[!data_cpu$JVM_CPU %in% boxplot.stats(data_cpu$JVM_CPU)$out]
但我需要从此数据框数据中删除异常值。任何想法,我怎么能做到这一点?
答案 0 :(得分:1)
您可以先确定要在df中保留哪些行(即不是异常值),然后使用逻辑向量对df进行子集化。
keep <- !data_cpu$JVM_CPU %in% boxplot.stats(data_cpu$JVM_CPU)$out
data_cpu[keep, ]
答案 1 :(得分:1)
使用它来索引行并删除这些行。
data_cpu[-which(data_cpu$JVM_CPU %in% boxplot.stats(data_cpu$JVM_CPU)$out), ]
或者,您的示例在您希望保留行的位置返回TRUE FALSE,因此您可以使用它。
data_cpu[!data_cpu$JVM_CPU %in% boxplot.stats(data_cpu$JVM_CPU)$out, ]