我正在尝试使用dplyr获取长格式的统计描述表。我确实尝试过用collect,但是不起作用...我的代码示例:
data(mtcars)
table=mtcars %>%
summarise_all(funs(mean, sd,median, min, max))
dim(table)
[1] 1 55
table[1:4,1:4]
> mpg_mean cyl_mean disp_mean hp_mean
1 20.09062 6.1875 230.7219 146.6875
NA NA NA NA NA
NA.1 NA NA NA NA
NA.2 NA NA NA NA
table2=mtcars %>%
gather(stat) %>%
summarise_all(funs(mean, sd,median, min, max))
dim(table2)
table2[1:4,1:4]
1: In mean.default(stat) :
argument is not numeric or logical: returning NA
2: In var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm = na.rm) :
NAs introduced by coercion
3: In mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]) :
argument is not numeric or logical: returning NA
> [1] 1 10
> stat_mean value_mean stat_sd value_sd
1 NA 39.60853 NA 84.20792
NA NA NA NA NA
NA.1 NA NA NA NA
NA.2 NA NA NA NA
在我的脑海中,每个统计数据都会得到类似的结果:
mean
mpg 20.09062
cyl 6.1875
disp 230.7219
hp 146.6875
在这里,我添加了一个数据框的真实示例,删除了点和_,我认为下面的解决方案更容易:
ex = data.frame(title_one = c(11,22,34,22,43,454),title.two = c(22,44,343,3434,424,676),title3 = c(6,1,0,1 ,1,1))
names(ex) = gsub(pattern = "_*", replacement = "", x = names(ex))
names(ex) = gsub(pattern = ".", replacement = "", x = names(ex), fixed = TRUE)
table = ex %>%
summarise_all(funs( min, max,mean, sd))
gather(table) %>%
separate(key, into = c("key1", 'key2')) %>%
spread(key2, value)
> + + key1 max mean min sd
1 title3 6 1.666667 0 2.160247
2 titleone 454 97.666667 11 174.915599
3 titletwo 3434 823.833333 22 1302.072873
答案 0 :(得分:0)
我们可以gather
转换为“长”格式,separate
将“键”转换为两列,然后spread
将其转换为“宽”格式
library(tidyverse)
gather(table) %>%
separate(key, into = c("key1", 'key2')) %>%
spread(key2, value)
对于具有多个定界符的更新数据集,我们可以使用extract
来捕获字符
gather(table) %>%
extract(key, into = c("key1", "key2"), "^(\\w+)[_.](.*)") %>%
spread(key2, value)