我有一个数据框df1
,该数据框总结了每6小时间隔和每个区域(mean_A
和mean_B
)的平均动物数量。我也有这种方法的标准错误(Se_A
和Se_B
)。例如:
df1<-data.frame(Hour=c(0,6,12,18,24),
mean_A= c(7.3,6.8,8.9,3.4,12.1),
mean_B=c(6.3,8.2,3.1,4.8,13.2),
Se_A=c(1.3,2.1,0.9,3.2,0.8),
Se_B=c(0.9,0.3,1.8,1.1,1.3))
> df1
Hour mean_A mean_B Se_A Se_B
1 0 7.3 6.3 1.3 0.9
2 6 6.8 8.2 2.1 0.3
3 12 8.9 3.1 0.9 1.8
4 18 3.4 4.8 3.2 1.1
5 24 12.1 13.2 0.8 1.3
出于绘图原因,我需要重新组织数据框。我需要的是这个(或类似的):
> df1
Hour meanType meanValue Se
1 0 mean_A 7.3 1.3
2 6 mean_A 6.8 2.1
3 12 mean_A 8.9 0.9
4 18 mean_A 3.4 3.2
5 24 mean_A 12.1 0.8
6 0 mean_B 6.3 0.9
7 6 mean_B 8.2 0.3
8 12 mean_B 3.1 1.8
9 18 mean_B 4.8 1.1
10 24 mean_B 13.2 1.3
有人怎么做吗?
答案 0 :(得分:3)
我们可以使用melt
中的data.table
,因为它是内置的,当从'wide整形时,采用多个measure
patterns
创建单独的列'到'long'
library(data.table)
melt(setDT(df1), measure = patterns("^mean", "^Se"),
variable.name = "meanType", value.name = c("meanValue", "Se"))[,
meanType := names(df1)[2:3][meanType]][]
# Hour meanType meanValue Se
# 1: 0 mean_A 7.3 1.3
# 2: 6 mean_A 6.8 2.1
# 3: 12 mean_A 8.9 0.9
# 4: 18 mean_A 3.4 3.2
# 5: 24 mean_A 12.1 0.8
# 6: 0 mean_B 6.3 0.9
# 7: 6 mean_B 8.2 0.3
# 8: 12 mean_B 3.1 1.8
# 9: 18 mean_B 4.8 1.1
#10: 24 mean_B 13.2 1.3
如果我们需要一种tidyverse
方法
library(tidyversse)
gather(df1, meanType, val, -Hour) %>%
separate(meanType, into = c("meanType1", "meanType")) %>%
spread(meanType1, val) %>%
mutate(meanType = str_c("mean_", meanType)) %>%
arrange(meanType)
# Hour meanType mean Se
#1 0 mean_A 7.3 1.3
#2 6 mean_A 6.8 2.1
#3 12 mean_A 8.9 0.9
#4 18 mean_A 3.4 3.2
#5 24 mean_A 12.1 0.8
#6 0 mean_B 6.3 0.9
#7 6 mean_B 8.2 0.3
#8 12 mean_B 3.1 1.8
#9 18 mean_B 4.8 1.1
#10 24 mean_B 13.2 1.3
注意:gather
在这里也可以使用,但是请确保在进行type
之前先检查列的gather
。由于这两列都是数字类型,所以这不是问题。何时,我们有多种类型,并且如果我们gather
进入单个列,那么在type_convert
步骤之后,我们可能需要readr
(来自spread
)
答案 1 :(得分:3)
使用reshape
reshape(df1, idvar = "Hour", varying = 2:5, direction = "long", sep = "_", timevar = "type")
# Hour type mean Se
#0.A 0 A 7.3 1.3
#6.A 6 A 6.8 2.1
#12.A 12 A 8.9 0.9
#18.A 18 A 3.4 3.2
#24.A 24 A 12.1 0.8
#0.B 0 B 6.3 0.9
#6.B 6 B 8.2 0.3
#12.B 12 B 3.1 1.8
#18.B 18 B 4.8 1.1
#24.B 24 B 13.2 1.3
我们还可以使用tidyr
的{{1}}(版本0.8.3.9000)
pivot_longer
从vignette:
请注意特殊的变量名称
library(tidyr) pivot_longer(df1, cols = -Hour, names_to = c(".value", "Type"), names_sep = "_") # A tibble: 10 x 4 # Hour Type mean Se # <dbl> <chr> <dbl> <dbl> # 1 0 A 7.3 1.3 # 2 0 B 6.3 0.9 # 3 6 A 6.8 2.1 # 4 6 B 8.2 0.3 # 5 12 A 8.9 0.9 # 6 12 B 3.1 1.8 # 7 18 A 3.4 3.2 # 8 18 B 4.8 1.1 # 9 24 A 12.1 0.8 #10 24 B 13.2 1.3
:这告诉.value
变量名称的那个部分定义了输出值列的名称。