使用dplyr的描述性统计表中的长格式

时间:2019-07-29 15:15:55

标签: r dplyr statistics

我正在尝试使用dplyr获取长格式的统计描述表。我确实尝试过用collect,但是不起作用...我的代码示例:

data(mtcars)

table=mtcars %>% 
summarise_all(funs(mean, sd,median, min, max))
dim(table)

[1]  1 55

table[1:4,1:4]

>      mpg_mean cyl_mean disp_mean  hp_mean
1    20.09062   6.1875  230.7219 146.6875
NA         NA       NA        NA       NA
NA.1       NA       NA        NA       NA
NA.2       NA       NA        NA       NA

table2=mtcars %>% 
gather(stat) %>%
summarise_all(funs(mean, sd,median, min, max))
dim(table2)
table2[1:4,1:4]

1: In mean.default(stat) :
  argument is not numeric or logical: returning NA
2: In var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm = na.rm) :
  NAs introduced by coercion
3: In mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]) :
  argument is not numeric or logical: returning NA
> [1]  1 10
>      stat_mean value_mean stat_sd value_sd
1           NA   39.60853      NA 84.20792
NA          NA         NA      NA       NA
NA.1        NA         NA      NA       NA
NA.2        NA         NA      NA       NA

在我的脑海中,每个统计数据都会得到类似的结果:

          mean    
mpg    20.09062   
cyl     6.1875    
disp    230.7219  
hp      146.6875  

编辑:

在这里,我添加了一个数据框的真实示例,删除了点和_,我认为下面的解决方案更容易:

ex = data.frame(title_one = c(11,22,34,22,43,454),title.two = c(22,44,343,3434,424,676),title3 = c(6,1,0,1 ,1,1))

names(ex) = gsub(pattern = "_*", replacement = "", x = names(ex)) 
names(ex) = gsub(pattern = ".", replacement = "", x = names(ex), fixed = TRUE)  

 table = ex %>%
   summarise_all(funs( min, max,mean, sd))

  gather(table) %>%
   separate(key, into = c("key1", 'key2')) %>%
   spread(key2, value)

  > + +       key1  max       mean min          sd
  1   title3    6   1.666667   0    2.160247
  2 titleone  454  97.666667  11  174.915599
  3 titletwo 3434 823.833333  22 1302.072873

1 个答案:

答案 0 :(得分:0)

我们可以gather转换为“长”格式,separate将“键”转换为两列,然后spread将其转换为“宽”格式

library(tidyverse)
gather(table) %>%
    separate(key, into = c("key1", 'key2')) %>%
    spread(key2, value)

对于具有多个定界符的更新数据集,我们可以使用extract来捕获字符

gather(table) %>%
    extract(key, into = c("key1", "key2"), "^(\\w+)[_.](.*)") %>% 
    spread(key2, value)