我有一个数据集,其中每个变量的均值和sd为列,但我想将其转换为“长”格式:
library(tidyverse)
iris %>%
group_by(Species) %>%
summarize_all(list(mean = mean, sd = sd))
#> # A tibble: 3 x 9
#> Species Sepal.Length_me~ Sepal.Width_mean Petal.Length_me~
#> <fct> <dbl> <dbl> <dbl>
#> 1 setosa 5.01 3.43 1.46
#> 2 versic~ 5.94 2.77 4.26
#> 3 virgin~ 6.59 2.97 5.55
#> # ... with 5 more variables: Petal.Width_mean <dbl>,
#> # Sepal.Length_sd <dbl>, Sepal.Width_sd <dbl>, Petal.Length_sd <dbl>,
#> # Petal.Width_sd <dbl>
# Desired output:
#
# tribble(~Species, ~Variable, ~Mean, ~SD
# #-------------------------------
# ... )
我觉得tidyr::gather
可以在这里很好地使用,但是,我不确定每个键具有两个值的语法如何工作。还是我需要使用两个收集器并对其进行列绑定?
答案 0 :(得分:3)
要转换后summarise_all
数据,您可以执行以下操作
df %>%
gather(key, val, -Species) %>%
separate(key, into = c("Variable", "metric"), sep = "_") %>%
spread(metric, val)
## A tibble: 12 x 4
# Species Variable mean sd
# <fct> <chr> <dbl> <dbl>
# 1 setosa Petal.Length 1.46 0.174
# 2 setosa Petal.Width 0.246 0.105
# 3 setosa Sepal.Length 5.01 0.352
# 4 setosa Sepal.Width 3.43 0.379
# 5 versicolor Petal.Length 4.26 0.470
# 6 versicolor Petal.Width 1.33 0.198
# 7 versicolor Sepal.Length 5.94 0.516
# 8 versicolor Sepal.Width 2.77 0.314
# 9 virginica Petal.Length 5.55 0.552
#10 virginica Petal.Width 2.03 0.275
#11 virginica Sepal.Length 6.59 0.636
#12 virginica Sepal.Width 2.97 0.322
但是从一开始就将数据从宽转换为长实际上更快,更短
iris %>%
gather(Variable, val, -Species) %>%
group_by(Species, Variable) %>%
summarise(Mean = mean(val), SD = sd(val))
## A tibble: 12 x 4
## Groups: Species [?]
# Species Variable Mean SD
# <fct> <chr> <dbl> <dbl>
# 1 setosa Petal.Length 1.46 0.174
# 2 setosa Petal.Width 0.246 0.105
# 3 setosa Sepal.Length 5.01 0.352
# 4 setosa Sepal.Width 3.43 0.379
# 5 versicolor Petal.Length 4.26 0.470
# 6 versicolor Petal.Width 1.33 0.198
# 7 versicolor Sepal.Length 5.94 0.516
# 8 versicolor Sepal.Width 2.77 0.314
# 9 virginica Petal.Length 5.55 0.552
#10 virginica Petal.Width 2.03 0.275
#11 virginica Sepal.Length 6.59 0.636
#12 virginica Sepal.Width 2.97 0.322
答案 1 :(得分:0)
这是pivot_longer
开发版本中tidyr
的一个选项。
library(dplyr)
library(tidyr) #tidyr_0.8.3.9000
df %>%
rename_at(-1, ~ str_replace(., "(.*)_(.*)", "\\2_\\1")) %>%
pivot_longer(-Species, names_to = c(".value", "Variable"), names_sep = "_")
# A tibble: 12 x 4
# Species Variable mean sd
# <fct> <chr> <dbl> <dbl>
# 1 setosa Sepal.Length 5.01 0.352
# 2 setosa Sepal.Width 3.43 0.379
# 3 setosa Petal.Length 1.46 0.174
# 4 setosa Petal.Width 0.246 0.105
# 5 versicolor Sepal.Length 5.94 0.516
# 6 versicolor Sepal.Width 2.77 0.314
# 7 versicolor Petal.Length 4.26 0.470
# 8 versicolor Petal.Width 1.33 0.198
# 9 virginica Sepal.Length 6.59 0.636
#10 virginica Sepal.Width 2.97 0.322
#11 virginica Petal.Length 5.55 0.552
#12 virginica Petal.Width 2.03 0.275
data(iris)
df <- iris %>%
group_by(Species) %>%
summarize_all(list(mean = mean, sd = sd))