我有一个包含两列的data.frame,并使用aq.plot
包中的mvoutlier
来识别我的二维数据集中的潜在异常值。唯一的问题是我对生成的图表的“外观”不太满意,并希望获取他们正在绘制的数据并在其他软件中绘制图表。
对于我的具体情况,情节是由
生成的library('mvoutlier')
data = read.csv(fp, colClasses=c("NULL",NA,NA))
h = aq.plot(data)
data.frame,data
如下所示:
pr tas
1 5.133207 59.24362
2 20.173075 75.81661
3 24.819054 97.31020
4 35.893467 92.11203
5 27.752425 95.70120
6 25.765618 91.14163
7 20.895360 57.30519
8 8.921513 70.31467
9 36.031261 98.24573
10 27.166213 92.79554
11 8.889431 54.48514
12 59.564447 85.69632
13 43.818336 99.36451
14 43.408963 84.23207
15 22.653269 84.89939
16 21.480331 96.18303
17 22.827370 69.97202
18 23.252464 85.08739
19 14.618731 45.30504
20 40.795519 78.56758
21 37.310456 80.30799
22 31.099105 91.31675
23 33.107472 63.07043
24 9.611930 35.62702
生成的图如下所示:
所以我的问题是,如何在右上方的子图中绘制信息?根据信息,我指的是与每个点相关的x,y坐标和数字。如果有办法获得绘制两条垂直线的x值,也会很棒。
我看到调用h
命令的输出aq.plot()
给出了一个布尔数组,说明哪些点是异常值(TRUE)或不是(FALSE)但似乎没有对底层组件的访问权限情节。
非常感谢任何帮助。
答案 0 :(得分:3)
全部在aq.plot
的代码中。以下是右上角绘图的具体代码:
plot(s$x, (1:length(dist))/length(dist), col = 3, xlab = "Ordered squared robust distance",
ylab = "Cumulative probability", type = "n")
text(s$x, (1:length(dist))/length(dist), as.character(s$ix),
col = 3, cex = 0.8)
t <- seq(0, max(dist), by = 0.01)
lines(t, pchisq(t, df = ncol(x)), col = 6)
abline(v = delta, col = 5)
text(x = delta, y = 0.4, paste(100 * (pchisq(delta, df = ncol(x))),
"% Quantile", sep = ""), col = 5, pos = 2, srt = 90,
cex = 0.8)
xarw <- arw(x, covr$center, covr$cov, alpha = alpha)
if (xarw$cn < Inf) {
abline(v = xarw$cn, col = 4)
text(x = xarw$cn, y = 0.4, "Adjusted Quantile", col = 4,
pos = 4, srt = 90, cex = 0.8)
}
如果你查看函数aq.plot
的代码,你会发现你可以通过这种方式得到x坐标和相关的观察结果:
covr <- robustbase::covMcd(data, alpha = 1/2)
dist <- mahalanobis(data, center = covr$center, cov = covr$cov)
s <- sort(dist, index = TRUE)
s$x
# 22 4 6 10 21 18 15 5 14
# 0.1152036 0.2181437 0.3148553 0.3255492 0.3752751 0.4076276 0.4661830 0.5299942 0.7093746
# 9 20 3 16 2 13 17 23 12
# 0.7564636 0.7756129 0.8838616 1.0807574 1.3059546 1.4891242 1.8606975 2.9690980 3.9152682
# 8 7 1 11 19
# 4.0283820 5.0767176 7.4233298 7.9488595 10.3217389
然后y坐标:
(1:length(dist))/length(dist)
#[1] 0.04347826 0.08695652 0.13043478 0.17391304 0.21739130 0.26086957 0.30434783 0.34782609
#[9] 0.39130435 0.43478261 0.47826087 0.52173913 0.56521739 0.60869565 0.65217391 0.69565217
#[17] 0.73913043 0.78260870 0.82608696 0.86956522 0.91304348 0.95652174 1.00000000
您可以使用以上从上面更改的代码直接重建该绘图。阅读此代码并在构建绘图时跟随,应该可以帮助您查看在何处查找每条信息。在垂直线上查看abline
来电信息,您可以在qchisq(0.975, df = ncol(data))
和arw(data, covr$center, covr$cov, alpha = 0.05)$cn
找到值
plot(s$x, (1:length(dist))/length(dist), col = 3, xlab = "Ordered squared robust distance",
ylab = "Cumulative probability", type = "n")
text(s$x, (1:length(dist))/length(dist), as.character(s$ix),
col = 3, cex = 0.8)
t <- seq(0, max(dist), by = 0.01)
lines(t, pchisq(t, df = ncol(data)), col = 6)
abline(v = qchisq(0.975, df = ncol(data)), col = 5)
text(x = qchisq(0.975, df = ncol(data)),
y = 0.4, paste(100 * (pchisq(qchisq(0.975, df = ncol(data)), df = ncol(data))),
"% Quantile", sep = ""), col = 5, pos = 2, srt = 90,
cex = 0.8)
xarw <- arw(data, covr$center, covr$cov, alpha = 0.05)
if (xarw$cn < Inf) {
abline(v = xarw$cn, col = 4)
text(x = xarw$cn, y = 0.4, "Adjusted Quantile", col = 4,
pos = 4, srt = 90, cex = 0.8)
}