合并两组键值对数据的规范dplyr
或tidyverse
方式是什么?
第一个键值对是parameter
- coeft
。
第二个键值对是param
- value
。皱纹是这些重复的值。
我想将它们合并为一个键值对。
dat <- tidyr::crossing(sim=c(1:5),
parameter=c('mu','sigma'),
param=c('sd','sd')
) %>%
dplyr::mutate(coeft=rnorm(n=10)) %>%
dplyr::mutate(value=sort(rep(rnorm(n=5),2)))
> dat
# A tibble: 10 x 5
sim parameter param coeft value
<int> <chr> <chr> <dbl> <dbl>
1 1 mu sd -1.91 -0.601
2 1 sigma sd -0.967 -0.601
3 2 mu sd -1.95 0.0645
4 2 sigma sd 0.676 0.0645
5 3 mu sd -0.891 0.673
6 3 sigma sd -0.328 0.673
7 4 mu sd -2.30 1.08
8 4 sigma sd 0.679 1.08
9 5 mu sd -0.598 1.99
10 5 sigma sd -0.339 1.99
理想的结构:
# A tibble: 15 x 3
sim parameter coeft
<int> <chr> <dbl>
1 1 mu -1.91
2 1 sigma -0.967
3 1 sd -0.601
4 2 mu -1.95
5 2 sigma 0.676
6 2 sd 0.0645
...
答案 0 :(得分:3)
以下是dplyr
的方法(使用dplyr
v0.7.4,Windows 7,R64位运行):
dat %>%
spread(parameter, coeft) %>% #convert to wide format
rename(sd = value) %>% #change the name of a column
gather(parameter, coeft, c(4,5,3)) %>% #convert three disjointly located columns to long format, note the order of columns
# gather(parameter, coeft, sd:sigma) %>% #convert three contiguously located columns to long format
arrange(sim) %>% #order of rows
select(-param)
这会对某些版本的dplyr(0.7.4)发出警告,但不会发出警告(明天会发布一个没有错误的版本 - 当我检查时)。
warning:
Warning message:
In if (!is.finite(x)) return(FALSE) :
the condition has length > 1 and only the first element will be used
在这种情况下,可以在没有警告的情况下运行:
dat %>%
spread(parameter, coeft) %>%
dplyr::rename(sd = value) %>%
gather(parameter, coeft, "mu", "sigma", "sd") %>%
arrange(sim) %>% #order of rows
select(-param)
另请注意,如果您希望使用列排除表示法,则需要先排除param
列。
dat %>%
spread(parameter, coeft) %>% #convert to wide format
rename(sd = value) %>% #change the name of a column
select(-param) %>%
gather(parameter, coeft, -sim) %>% #convert three contiguously located columns to long format
arrange(sim) #order of rows
#output
sim parameter coeft
<int> <chr> <dbl>
1 1 mu -0.626
2 1 sigma 0.184
3 1 sd -2.21
4 2 mu -0.836
5 2 sigma 1.60
6 2 sd -0.621
7 3 mu 0.330
8 3 sigma -0.820
9 3 sd 0.390
10 4 mu 0.487
11 4 sigma 0.738
12 4 sd 1.12
13 5 mu 0.576
14 5 sigma -0.305
15 5 sd 1.51
数据:
set.seed(1)
dat <- tidyr::crossing(sim=c(1:5),
parameter=c('mu','sigma'),
param=c('sd','sd')
) %>%
dplyr::mutate(coeft=rnorm(n=10)) %>%
dplyr::mutate(value=sort(rep(rnorm(n=5),2)))
答案 1 :(得分:0)
如果我们需要重塑“长期”的话。格式化多组列,然后melt
中的data.table
是一个选项
library(data.table)
dt <- unique(melt(setDT(dat), measure = list(2:3, 4:5),
value.name = c('parameter', 'coeft')))[, variable := NULL][order(sim)]
dt
# sim parameter coeft
# 1: 1 mu -1.9100
# 2: 1 sigma -0.9670
# 3: 1 sd -0.6010
# 4: 2 mu -1.9500
# 5: 2 sigma 0.6760
# 6: 2 sd 0.0645
# 7: 3 mu -0.8910
# 8: 3 sigma -0.3280
# 9: 3 sd 0.6730
#10: 4 mu -2.3000
#11: 4 sigma 0.6790
#12: 4 sd 1.0800
#13: 5 mu -0.5980
#14: 5 sigma -0.3390
#15: 5 sd 1.9900
dat <- structure(list(sim = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L
), parameter = c("mu", "sigma", "mu", "sigma", "mu", "sigma",
"mu", "sigma", "mu", "sigma"), param = c("sd", "sd", "sd", "sd",
"sd", "sd", "sd", "sd", "sd", "sd"), coeft = c(-1.91, -0.967,
-1.95, 0.676, -0.891, -0.328, -2.3, 0.679, -0.598, -0.339), value = c(-0.601,
-0.601, 0.0645, 0.0645, 0.673, 0.673, 1.08, 1.08, 1.99, 1.99)),
.Names = c("sim",
"parameter", "param", "coeft", "value"),
class = "data.frame", row.names = c(NA,
-10L))