如何从R中调整的分位数图中获取数据?

时间:2015-06-04 03:40:31

标签: r plot

我有一个包含两列的data.frame,并使用aq.plot包中的mvoutlier来识别我的二维数据集中的潜在异常值。唯一的问题是我对生成的图表的“外观”不太满意,并希望获取他们正在绘制的数据并在其他软件中绘制图表。

对于我的具体情况,情节是由

生成的
library('mvoutlier')

data = read.csv(fp, colClasses=c("NULL",NA,NA))

h = aq.plot(data)

data.frame,data如下所示:

    pr          tas
1   5.133207    59.24362
2   20.173075   75.81661
3   24.819054   97.31020
4   35.893467   92.11203
5   27.752425   95.70120
6   25.765618   91.14163
7   20.895360   57.30519
8   8.921513    70.31467
9   36.031261   98.24573
10  27.166213   92.79554
11  8.889431    54.48514
12  59.564447   85.69632
13  43.818336   99.36451
14  43.408963   84.23207
15  22.653269   84.89939
16  21.480331   96.18303
17  22.827370   69.97202
18  23.252464   85.08739
19  14.618731   45.30504
20  40.795519   78.56758
21  37.310456   80.30799
22  31.099105   91.31675
23  33.107472   63.07043
24  9.611930    35.62702

生成的图如下所示:

enter image description here

所以我的问题是,如何在右上方的子图中绘制信息?根据信息,我指的是与每个点相关的x,y坐标和数字。如果有办法获得绘制两条垂直线的x值,也会很棒。

我看到调用h命令的输出aq.plot()给出了一个布尔数组,说明哪些点是异常值(TRUE)或不是(FALSE)但似乎没有对底层组件的访问权限情节。

非常感谢任何帮助。

1 个答案:

答案 0 :(得分:3)

全部在aq.plot的代码中。以下是右上角绘图的具体代码:

plot(s$x, (1:length(dist))/length(dist), col = 3, xlab = "Ordered squared robust distance", 
        ylab = "Cumulative probability", type = "n")
    text(s$x, (1:length(dist))/length(dist), as.character(s$ix), 
        col = 3, cex = 0.8)
    t <- seq(0, max(dist), by = 0.01)
    lines(t, pchisq(t, df = ncol(x)), col = 6)
    abline(v = delta, col = 5)
    text(x = delta, y = 0.4, paste(100 * (pchisq(delta, df = ncol(x))), 
        "% Quantile", sep = ""), col = 5, pos = 2, srt = 90, 
        cex = 0.8)
    xarw <- arw(x, covr$center, covr$cov, alpha = alpha)
    if (xarw$cn < Inf) {
        abline(v = xarw$cn, col = 4)
        text(x = xarw$cn, y = 0.4, "Adjusted Quantile", col = 4, 
            pos = 4, srt = 90, cex = 0.8)
    }

如果你查看函数aq.plot的代码,你会发现你可以通过这种方式得到x坐标和相关的观察结果:

covr <- robustbase::covMcd(data, alpha = 1/2)
dist <- mahalanobis(data, center = covr$center, cov = covr$cov)
s <- sort(dist, index = TRUE)
s$x 
#        22          4          6         10         21         18         15          5         14 
# 0.1152036  0.2181437  0.3148553  0.3255492  0.3752751  0.4076276  0.4661830  0.5299942  0.7093746 
#         9         20          3         16          2         13         17         23         12 
# 0.7564636  0.7756129  0.8838616  1.0807574  1.3059546  1.4891242  1.8606975  2.9690980  3.9152682 
#         8          7          1         11         19 
# 4.0283820  5.0767176  7.4233298  7.9488595 10.3217389 

然后y坐标:

(1:length(dist))/length(dist)
#[1] 0.04347826 0.08695652 0.13043478 0.17391304 0.21739130 0.26086957 0.30434783 0.34782609
#[9] 0.39130435 0.43478261 0.47826087 0.52173913 0.56521739 0.60869565 0.65217391 0.69565217
#[17] 0.73913043 0.78260870 0.82608696 0.86956522 0.91304348 0.95652174 1.00000000

您可以使用以上从上面更改的代码直接重建该绘图。阅读此代码并在构建绘图时跟随,应该可以帮助您查看在何处查找每条信息。在垂直线上查看abline来电信息,您可以在qchisq(0.975, df = ncol(data))arw(data, covr$center, covr$cov, alpha = 0.05)$cn找到值

 plot(s$x, (1:length(dist))/length(dist), col = 3, xlab = "Ordered squared robust distance", 
        ylab = "Cumulative probability", type = "n")
    text(s$x, (1:length(dist))/length(dist), as.character(s$ix), 
        col = 3, cex = 0.8)
    t <- seq(0, max(dist), by = 0.01)
    lines(t, pchisq(t, df = ncol(data)), col = 6)
    abline(v = qchisq(0.975, df = ncol(data)), col = 5)
    text(x = qchisq(0.975, df = ncol(data)), 
         y = 0.4, paste(100 * (pchisq(qchisq(0.975, df = ncol(data)), df = ncol(data))), 
        "% Quantile", sep = ""), col = 5, pos = 2, srt = 90, 
        cex = 0.8)
    xarw <- arw(data, covr$center, covr$cov, alpha = 0.05)
    if (xarw$cn < Inf) {
        abline(v = xarw$cn, col = 4)
        text(x = xarw$cn, y = 0.4, "Adjusted Quantile", col = 4, 
            pos = 4, srt = 90, cex = 0.8)
    }