x<-matrix(c(0.00009852, -0.00393314, -0.00049056, -0.00117636,
-0.00283716, 0.00136866, -0.00536613, -0.00068090, 0.01528542,
0.01221890, -0.00309366, 0.00379356,-0.00159904, -0.00259300,
-0.00635427, 0.00446363,0.00119367, 0.00079657, 0.00419246,
0.00090068,0.00160321,0.00623682, -0.00010090, -0.00070604),ncol=4)
x<-data.frame(x)
names(x)<-c("active","inactive","injured","rehab")
active inactive injured rehab
1 0.00009852 -0.00536613 -0.00159904 0.00419246
2 -0.00393314 -0.00068090 -0.00259300 0.00090068
3 -0.00049056 0.01528542 -0.00635427 0.00160321
4 -0.00117636 0.01221890 0.00446363 0.00623682
5 -0.00283716 -0.00309366 0.00119367 -0.00010090
6 0.00136866 0.00379356 0.00079657 -0.00070604
所以我有这个名为(x)的数据集。 我想
1)找到每个列的异常值
2)如果有超出/低于异常值的任何值,则扫描该列
3)将具有异常值的列移动到名为y的新数据框中。
任务1,我使用以下内容:
quantile1<-function(k){
quantile(k, 0.25)+IQR(k)
}
quantile3<-function(k){
quantile(k,0.75)+IQR(k)
}
lower_outlier<-apply(x, 2, quantile1)
upper_outlier<-apply(x, 2, quantile3)
View(t(lower_outlier))
active inactive injured rehab
-0.000048750 0.010112565 0.001094395 0.003545148
View(t(upper_outlier))
active inactive injured rehab
0.00232446 0.0227156 0.0045333 0.0069408
所以现在我对每个列都有上限和下限。如何继续执行任务2和3?我相信其中一种方法就像
x <- x[x <= value]
但我不确定。任何建议都非常感谢
答案 0 :(得分:0)
我们可以在比较对象的长度相同(通过复制)之后使用关系运算符创建逻辑矩阵,然后获取矩阵的colSums
,否定(!
)以使列具有0个异常值为TRUE,其他为FALSE
i1 <- !colSums(x > upper_outlier[col(x)] | x < lower_outlier[col(x)])
根据索引
子集数据集列y <- x[, i1, drop = FALSE]