分散在R-dplyr tidyr解决方案中的多列上

时间:2018-08-14 17:11:26

标签: r dplyr tidyr

我有一个长格式的逐年逐月数据,我试图用两列来spread。我看到的唯一示例包括一个key

> dput(df)
structure(list(ID = c("a", "a", "a", "a", "a", "a", "a", "a", 
"a", "b", "b", "b", "b", "b", "b", "b", "b", "b"), Year = c(2015L, 
2015L, 2015L, 2016L, 2016L, 2016L, 2017L, 2017L, 2017L, 2015L, 
2015L, 2015L, 2016L, 2016L, 2016L, 2017L, 2017L, 2017L), Month = c(1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L), Value = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 6L, 7L, 8L, 
9L, 10L, 11L, 12L, 13L, 14L)), .Names = c("ID", "Year", "Month", 
"Value"), class = "data.frame", row.names = c(NA, -18L))

我正在尝试将其转换成一种数据格式,其中年份为2:5列,每monthID一行

ID  Month   2015    2016    2017
a   1         1     2       3
a   2         1     2       3
a   3         1     2       3
a   1         6     9       12
a   2         7     10      13
a   3         8     11      14

我尝试了以下错误,并且出现以下错误:

by_month_over_years = spread(df,key = c(Year,Month), Value)
Error: `var` must evaluate to a single number or a column name, not an integer vector

3 个答案:

答案 0 :(得分:3)

library(tidyr)
library(dplyr)
df %>% group_by(ID) %>% spread(Year, Value)

# A tibble: 6 x 5
# Groups:   ID [2]
  ID    Month `2015` `2016` `2017`
  <chr> <int>  <int>  <int>  <int>
 1 a         1      1      2      3
 2 a         2      1      2      3
 3 a         3      1      2      3
 4 b         1      6      9     12
 5 b         2      7     10     13
 6 b         3      8     11     14

答案 1 :(得分:3)

library(reshape2) # or data.table, for dcast

dcast(df, ID + Month ~ Year)

#   ID Month 2015 2016 2017
# 1  a     1    1    2    3
# 2  a     2    1    2    3
# 3  a     3    1    2    3
# 4  b     1    6    9   12
# 5  b     2    7   10   13
# 6  b     3    8   11   14

答案 2 :(得分:2)

这里是base R的{​​{1}}选项

reshape