如何在R中按群组应用shapiro测试?

时间:2013-08-27 18:51:41

标签: r aggregate-functions

我有一个数据框,其中我的所有90个变量都有整数数据,类型为:

  

代码| variable1 |变量2 |变量3 | ...

     

AB | 2 | 3 | 10 | ...

     

AH | 4 | 6 | 8 | ...

     

BC | 1 | 5 | 9 | ...

     

... | ...... | ...... | ...

我想通过变量将shapiro测试(shapiro.test {stats})应用于我的数据框,并将结果写在如下表中:

  

variable_name | W | p值

有没有人有线索?

3 个答案:

答案 0 :(得分:0)

使用R

中的mtcars数据
mydata<-mtcars
 kk<-Map(function(x)cbind(shapiro.test(x)$statistic,shapiro.test(x)$p.value),mydata)
library(plyr)
myout<-ldply(kk)
names(myout)<-c("var","W","p.value")
myout
    var         W      p.value
1   mpg 0.9475648 1.228816e-01
2   cyl 0.7533102 6.058378e-06
3  disp 0.9200127 2.080660e-02
4    hp 0.9334191 4.880736e-02
5  drat 0.9458838 1.100604e-01
6    wt 0.9432578 9.265551e-02
7  qsec 0.9732511 5.935208e-01
8    vs 0.6322636 9.737384e-08
9    am 0.6250744 7.836356e-08
10 gear 0.7727857 1.306847e-05
11 carb 0.8510972 4.382401e-04

答案 1 :(得分:0)

categorySchema = new mongoose.Schema({ name : {type: String, required: true}, parent : {type: Schema.Types.ObjectId, ref: 'Category'} }) 数据的示例。

mtcars

结果:

library(tidyverse)
library(broom)

mtcars %>% 
    select(-am, - wt) %>% # Remove unnecessary columns
    gather(key = "variable_name", value = "value") %>%
    group_by(variable_name)  %>% 
    do(broom::tidy(shapiro.test(.$value)))  %>% 
    ungroup()  %>% 
    select(variable_name, W = statistic, `p-value` = p.value)

答案 2 :(得分:0)

@GegznaV的回答非常好,但与此同时,tidyverse有一些较新的结构,例如tidyr::pivot_longer代替了tidyr::gather,tidyverse的作者推荐了nest-unnest语法。

我还用broom::tidy代替了broom::glance,因为它提供了更多模型(例如aov())的统计信息。

下面是用更新的tidyverse语法重写的@GegznaV的示例:

library(tidyverse)
library(broom)

mtcars %>% 
  select(-am, -wt) %>%
  pivot_longer(
    cols = everything(),
    names_to = "variable_name",
    values_to = "value"
  ) %>% 
  nest(data = -variable_name) %>% 
  mutate(
    shapiro = map(data, ~shapiro.test(.x$value)),
    glanced = map(shapiro, glance)
  ) %>% 
  unnest(glanced) %>% 
  select(variable_name, W = statistic, p.value) %>% 
  arrange(variable_name)

给出相同的结果:

# A tibble: 9 x 3
  variable_name     W      p.value
  <chr>         <dbl>        <dbl>
1 carb          0.851 0.000438    
2 cyl           0.753 0.00000606  
3 disp          0.920 0.0208      
4 drat          0.946 0.110       
5 gear          0.773 0.0000131   
6 hp            0.933 0.0488      
7 mpg           0.948 0.123       
8 qsec          0.973 0.594       
9 vs            0.632 0.0000000974