如何使用函数“ gather”(或类似函数)重组数据以将四个变量减少为两个

时间:2019-07-12 14:35:14

标签: r dataframe data.table reshape tidyr

我有一个数据框df1,该数据框总结了每6小时间隔和每个区域(mean_Amean_B)的平均动物数量。我也有这种方法的标准错误(Se_ASe_B)。例如:

df1<-data.frame(Hour=c(0,6,12,18,24),
                mean_A= c(7.3,6.8,8.9,3.4,12.1),
                mean_B=c(6.3,8.2,3.1,4.8,13.2),
                Se_A=c(1.3,2.1,0.9,3.2,0.8),
                Se_B=c(0.9,0.3,1.8,1.1,1.3))

> df1

  Hour mean_A mean_B Se_A Se_B
1    0    7.3    6.3  1.3  0.9
2    6    6.8    8.2  2.1  0.3
3   12    8.9    3.1  0.9  1.8
4   18    3.4    4.8  3.2  1.1
5   24   12.1   13.2  0.8  1.3

出于绘图原因,我需要重新组织数据框。我需要的是这个(或类似的):

> df1
   Hour meanType meanValue  Se
1     0   mean_A       7.3 1.3
2     6   mean_A       6.8 2.1
3    12   mean_A       8.9 0.9
4    18   mean_A       3.4 3.2
5    24   mean_A      12.1 0.8
6     0   mean_B       6.3 0.9
7     6   mean_B       8.2 0.3
8    12   mean_B       3.1 1.8
9    18   mean_B       4.8 1.1
10   24   mean_B      13.2 1.3

有人怎么做吗?

2 个答案:

答案 0 :(得分:3)

我们可以使用melt中的data.table,因为它是内置的,当从'wide整形时,采用多个measure patterns创建单独的列'到'long'

library(data.table)
melt(setDT(df1), measure = patterns("^mean", "^Se"), 
      variable.name = "meanType", value.name = c("meanValue", "Se"))[,
        meanType := names(df1)[2:3][meanType]][]
#    Hour meanType meanValue  Se
# 1:    0   mean_A       7.3 1.3
# 2:    6   mean_A       6.8 2.1
# 3:   12   mean_A       8.9 0.9
# 4:   18   mean_A       3.4 3.2
# 5:   24   mean_A      12.1 0.8
# 6:    0   mean_B       6.3 0.9
# 7:    6   mean_B       8.2 0.3
# 8:   12   mean_B       3.1 1.8
# 9:   18   mean_B       4.8 1.1
#10:   24   mean_B      13.2 1.3

如果我们需要一种tidyverse方法

library(tidyversse)
gather(df1, meanType, val, -Hour) %>% 
   separate(meanType, into = c("meanType1", "meanType")) %>%  
   spread(meanType1, val) %>%
   mutate(meanType = str_c("mean_", meanType)) %>%
   arrange(meanType)
#   Hour meanType mean  Se
#1     0   mean_A  7.3 1.3
#2     6   mean_A  6.8 2.1
#3    12   mean_A  8.9 0.9
#4    18   mean_A  3.4 3.2
#5    24   mean_A 12.1 0.8
#6     0   mean_B  6.3 0.9
#7     6   mean_B  8.2 0.3
#8    12   mean_B  3.1 1.8
#9    18   mean_B  4.8 1.1
#10   24   mean_B 13.2 1.3

注意:gather在这里也可以使用,但是请确保在进行type之前先检查列的gather。由于这两列都是数字类型,所以这不是问题。何时,我们有多种类型,并且如果我们gather进入单个列,那么在type_convert步骤之后,我们可能需要readr(来自spread

答案 1 :(得分:3)

使用reshape

reshape(df1, idvar = "Hour", varying = 2:5, direction = "long", sep = "_", timevar = "type")
#     Hour type mean  Se
#0.A     0    A  7.3 1.3
#6.A     6    A  6.8 2.1
#12.A   12    A  8.9 0.9
#18.A   18    A  3.4 3.2
#24.A   24    A 12.1 0.8
#0.B     0    B  6.3 0.9
#6.B     6    B  8.2 0.3
#12.B   12    B  3.1 1.8
#18.B   18    B  4.8 1.1
#24.B   24    B 13.2 1.3

我们还可以使用tidyr的{​​{1}}(版本0.8.3.9000)

pivot_longer

vignette

  

请注意特殊的变量名称library(tidyr) pivot_longer(df1, cols = -Hour, names_to = c(".value", "Type"), names_sep = "_") # A tibble: 10 x 4 # Hour Type mean Se # <dbl> <chr> <dbl> <dbl> # 1 0 A 7.3 1.3 # 2 0 B 6.3 0.9 # 3 6 A 6.8 2.1 # 4 6 B 8.2 0.3 # 5 12 A 8.9 0.9 # 6 12 B 3.1 1.8 # 7 18 A 3.4 3.2 # 8 18 B 4.8 1.1 # 9 24 A 12.1 0.8 #10 24 B 13.2 1.3 :这告诉.value变量名称的那个部分定义了输出值列的名称。