我是R的新手并且正在进行任务我应该从线性回归中复制结果(时间序列数据包含1360个观测值和52个变量(回归模型中的11个变量))。在最初的研究中,研究人员使用Hadi方法识别异常值。看来这在使用mvBacon功能的R中表现最好,这是正确的吗?我似乎无法找到关于如何使用它的好答案,有谁能告诉我如何使用此功能来查找异常值? (我非常感谢一个尽可能简单解释的答案,因为R对我来说很新)。 非常感谢你!
答案 0 :(得分:3)
是的,mvBACON用于基于一定距离的异常值识别。默认值是马哈拉诺比斯距离。 以下代码将引导您完成有关如何使用mvBACON识别异常值的mtcars子数据集的简单示例:
# Use mtcars (sub)dataset and plot it
data <- mtcars %>% select(mpg, disp)
plot(data, main = "mtcars")
# Add some outliers and plot again
data <- rbind(data,
data.frame(mpg = c(1, 80), disp = c(800, 1000)))
plot(data, main = "mtcars")
# Use mvBacon to calculate the distances and get the ouliers
library(robustX)
distances <- mvBACON(data)
# Plot it again...
plot(data, main = "mtcars")
# ...with highlighting the outliers
points(data[!distances$subset, ], col = "red", pch = 19)
# Some fine tuning, since lot of outliers seem to be still good for regression
distances <- mvBACON(data, alpha = 0.6)
plot(data, main = "mtcars")
points(data[!distances$subset, ], col = "red", pch = 19)