如何从数据框中删除异常值?

时间:2017-06-01 15:10:20

标签: r

我有这个数据集:

dput(head(data,20))
structure(list(Date = structure(c(1495722600, 1495723500, 1495724400, 
1495725300, 1495726200, 1495727100, 1495728000, 1495728900, 1495729800, 
1495730700, 1495731600, 1495732500, 1495733400, 1495734300, 1495735200, 
1495736100, 1495737000, 1495737900, 1495738800, 1495739700), class = c("POSIXct", 
"POSIXt"), tzone = ""), JVM_CPU = c(1.07500004768372, 1.75, 10.6979999542236, 
2.40000009536743, 2.42400002479553, 5.80000019073486, 6.80000019073486, 
1.85000002384186, 8.52499961853027, 0.800000011920929, 12.7740001678467, 
0.174999997019768, 0.499000012874603, 0.248999997973442, 6.82499980926514, 
1.125, 0.949000000953674, 0.874000012874603, 6.55000019073486, 
0.248999997973442)), .Names = c("Date", "JVM_CPU"), row.names = c(NA, 
-20L), class = c("tbl_df", "tbl", "data.frame"))

我需要对其进行子集设置,以便它没有异常值:

我可以这样做以从中删除异常值:data $ JVM_CPU:

data_cpu$JVM_CPU[!data_cpu$JVM_CPU %in% boxplot.stats(data_cpu$JVM_CPU)$out]

但我需要从此数据框数据中删除异常值。任何想法,我怎么能做到这一点?

2 个答案:

答案 0 :(得分:1)

您可以先确定要在df中保留哪些行(即不是异常值),然后使用逻辑向量对df进行子集化。

keep <- !data_cpu$JVM_CPU %in% boxplot.stats(data_cpu$JVM_CPU)$out
data_cpu[keep, ]

答案 1 :(得分:1)

使用它来索引行并删除这些行。

data_cpu[-which(data_cpu$JVM_CPU %in% boxplot.stats(data_cpu$JVM_CPU)$out), ]

或者,您的示例在您希望保留行的位置返回TRUE FALSE,因此您可以使用它。

data_cpu[!data_cpu$JVM_CPU %in% boxplot.stats(data_cpu$JVM_CPU)$out, ]