Question

我是使用R的新手，我只是从异常值套餐开始。可能这很容易，但有人可以告诉我如何同时运行几个Grubbs测试吗？我有20列，我想同时测试所有这些列。提前致谢

编辑：抱歉没有解释好。我试试。我今天开始使用R，我学习了如何使用grubbs.test（数据$ S1，类型= 10或11或20）进行Grubbs测试，并且进展顺利。但是我有一个包含20列的表，我想同时为每个列运行Grubbs测试。我可以一个一个地做，但我认为必须有一种方法来更快地做到这一点。我也在How to repeat the Grubbs test and flag the outliers处运行代码，并且工作得很好，但是我想再次使用我的20个样本。作为我的数据的一个例子：

S1 S2 S3 S4 S5 S6 S7 96 40 99 45 12 16 48 52 49 11 49 59 77 64 18 43 11 67 6 97 91 79 19 39 28 45 44 99 9 78 88 6 25 43 78 60 12 29 32 2 68 25 18 61 60 30 26 51 70 96 98 55 74 83 17 69 19 0 17 24 0 75 45 42 70 71 7 61 82 100 39 80 71 58 6 100 94 100 5 41 18 33 98 97

希望这有帮助。

Answer 1

你可以使用lapply：

library(outliers)

df = data.frame(a=runif(20),b=runif(20),c=runif(20))
tests = lapply(df,grubbs.test) 
# or with parameters:
tests = lapply(df,grubbs.test,opposite=T)

结果：

> tests
$a

    Grubbs test for one outlier

data:  X[[i]]
G = 1.80680, U = 0.81914, p-value = 0.6158
alternative hypothesis: highest value 0.963759744539857 is an outlier


$b

    Grubbs test for one outlier

data:  X[[i]]
G = 1.53140, U = 0.87008, p-value = 1
alternative hypothesis: highest value 0.975481075001881 is an outlier


$c

    Grubbs test for one outlier

data:  X[[i]]
G = 1.57910, U = 0.86186, p-value = 1
alternative hypothesis: lowest value 0.0136249314527959 is an outlier

您可以按如下方式访问结果：

> tests$a$statistic
        G         U 
1.8067906 0.8191417

希望这有帮助。

Answer 2

@Florian的答案可以稍作更新。例如，可以使用purrr包和tidyverse来获得精美且易于阅读的结果。如果您要比较组的负载，这将很有用：

加载必要的软件包：

library(dplyr)
library(purrr)
library(tidyr)
library(outliers)

创建一些数据-我们将使用Florian's answer中的数据，但会转换为现代的tibble和长格式：

df <-  tibble(a = runif(20), 
              b = runif(20),
              c = runif(20)) %>%
  # transform to along format
  tidyr::gather(letter, value)

然后我们可以使用apply中的map和map_dbl来代替purrr函数：

df %>%
  group_by(letter) %>%
  nest() %>% 
  mutate(n = map_dbl(data, ~ nrow(.x)), # number of entries
         G = map(data, ~ grubbs.test(.x$value)$statistic[[1]]), # G statistic
         U = map(data, ~ grubbs.test(.x$value)$statistic[[2]]), # U statistic
         grubbs = map(data, ~ grubbs.test(.x$value)$alternative), # Alternative hypotesis
         p_grubbs = map_dbl(data, ~ grubbs.test(.x$value)$p.value)) %>% # p-value
  # Let's make the output more fancy
  mutate(G = signif(unlist(G), 3),
         U = signif(unlist(U), 3),
         grubbs = unlist(grubbs),
         p_grubbs = signif(p_grubbs, 3)) %>%
  select(-data) %>% # remove temporary column
  arrange(p_grubbs)

所需的输出将是这样：

# A tibble: 3 x 6
  letter     n     G     U grubbs                                        p_grubbs
  <chr>  <dbl> <dbl> <dbl> <chr>                                            <dbl>
1 c         20  1.68 0.843 lowest value 0.0489965472370386 is an outlier     0.84
2 a         20  1.58 0.862 lowest value 0.0174888013862073 is an outlier     1   
3 b         20  1.57 0.863 lowest value 0.0656482006888837 is an outlier     1

几个Grubbs同时在R中测试

2 个答案: