如果它们拥挤,如何在R中的诊断图中看到编号的极值点?

时间:2017-11-07 21:44:09

标签: r linear-regression

有没有办法提取诊断图中的极值?就像在我的示例图像中一样,极端值有时会变得拥挤并且难以读取它们的数字。

Tachyons

1 个答案:

答案 0 :(得分:1)

这是一个例子

data(mtcars)

model <- lm(wt ~ disp, data = mtcars)
par(mfrow=c(2,2))
plot(model)

enter image description here

提取前5个残差:

sort(abs(residuals(model)), decreasing = T)[1:5]
#ouput
 Ford Pantera L      Lotus Europa Hornet Sportabout          Merc 280         Merc 280C 
        0.8904388         0.7534965         0.6835317         0.6652549         0.6652549 

提取前5名最高厨师距离

sort(cooks.distance(model), decreasing = T)[1:5]
#ouput
  Chrysler Imperial Lincoln Continental      Ford Pantera L        Lotus Europa   Hornet Sportabout 
          0.1671898           0.1650842           0.1326203           0.1095882           0.0849812 

前5个影响点:

sort(lm.influence(model, do.coef = FALSE)$hat, , decreasing = T)[1:5]
#ouput
 Cadillac Fleetwood Lincoln Continental   Chrysler Imperial    Pontiac Firebird      Toyota Corolla 
         0.15350324          0.14164508          0.12322550          0.09142639          0.08475684 

最高标准化残差

sort(abs(rstandard(model)),  decreasing = T)[1:5]
#output
   Ford Pantera L      Lotus Europa Hornet Sportabout Chrysler Imperial          Merc 280 
     2.009594          1.708056          1.546526          1.542459          1.484079 

如何获取索引的示例:

which(rownames(mtcars) %in% names(sort(abs(rstandard(model)),  decreasing = T)[1:5]))
[1]  5 10 17 28 29

mtcars[c(5, 10, 17, 28,29),]
#output
                   mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Chrysler Imperial 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Lotus Europa      30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L    15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4