R:重塑数据帧的元素,按顺序保持变量的顺序

时间:2017-12-12 23:24:41

标签: r list dataframe reshape

我正在努力重塑12个单独的列表元素[以数据帧格式]并且值保持在正确的顺序中。这些是按天计算的,value 1对应于月份测量的第一天,依此类推,直到value 31,即最大可能的天数/月。这是横向的。对于缺少测量/月数少于31天的情况,会出现-9999-9999不是问题所在。

> myplist[[1]]
     COOPID YEAR MONTH ELEMENT value 1 value 2 value 3 value 4 value 5 value 6 
3    170100 1982     9    PRCP       0      70      15       0       0       0       
8    170100 1982    10    PRCP       0      10       0       0       0       0       
13   170100 1982    11    PRCP       2      13       0     170       0       5       
18   170100 1982    12    PRCP       0       0       0       0       2       5       
23   170100 1983     1    PRCP       2       0       0       0       0      10       
28   170100 1983     2    PRCP   -9999       0       0      52       6       0  

我的目标是让列表元素垂直定向,以便每天都有自己的行,所以它看起来像这样:

> myplist[[1]]
    YEAR MONTH DAY PRCP
    1982     9   1    0
    1982     9   2   70
    1982     9   3   15

我试过这段代码:

melt(myplist[[1]], id.vars = c("COOPID", "YEAR", "MONTH", "ELEMENT"))

但它列出了每个月的value 1,而不是value 1value 2,...,value 31所需的序列。

      COOPID YEAR MONTH ELEMENT variable value
1     170100 1982     9    PRCP  value 1     0
2     170100 1982    10    PRCP  value 1     0
3     170100 1982    11    PRCP  value 1     2
4     170100 1982    12    PRCP  value 1     0
5     170100 1983     1    PRCP  value 1     2

此代码返回了此不良输出和错误:

> reshape(myplist[[1]], idvar = c("YEAR","MONTH"),varying =print(paste0("value",1:31)),sep = "",
+         timevar = c("YEAR","MONTH"),direction = "long")
 [1] "value1"  "value2"  "value3"  "value4"  "value5"  "value6"  "value7"  "value8"  "value9"  "value10"
[11] "value11" "value12" "value13" "value14" "value15" "value16" "value17" "value18" "value19" "value20"
[21] "value21" "value22" "value23" "value24" "value25" "value26" "value27" "value28" "value29" "value30"
[31] "value31"
Error in `[.data.frame`(data, , varying.i) : undefined columns selected

我也试过dcast无济于事,其他问题就这个和其他网站似乎没有解决我遇到的问题。我认为问题的根源在于我的年份范围每月有可变长度天数(到达给定月份的实际最后一天),但我的测量数据中的每个月都有31天值。

1 个答案:

答案 0 :(得分:1)

使用整洁的方法,我会将值列gather转换为长格式,然后gsub为该月的某一天。

library(tidyr)
library(dplyr)

df <- read.table(text = "
     COOPID YEAR MONTH ELEMENT 'value 1' 'value 2' 'value 3' 'value 4' 'value 5' 'value 6'
     170100 1982     9    PRCP         0        70        15         0         0         0
     170100 1982    10    PRCP         0        10         0         0         0         0
     170100 1982    11    PRCP         2        13         0       170         0         5
     170100 1982    12    PRCP         0         0         0         0         2         5
     170100 1983     1    PRCP         2         0         0         0         0        10
     170100 1983     2    PRCP     -9999         0         0        52         6         0
                 ", header = TRUE, stringsAsFactors = FALSE) %>% as_tibble

df %>%
  select(-ELEMENT) %>%
  gather(DAY, PRCP, -c(COOPID, YEAR, MONTH)) %>%
  mutate(DAY = as.integer(gsub("value\\.", "", DAY))) %>%
  arrange(COOPID, YEAR, MONTH, DAY)

# # A tibble: 36 x 5
#    COOPID  YEAR MONTH   DAY  PRCP
#     <int> <int> <int> <int> <int>
#  1 170100  1982     9     1     0
#  2 170100  1982     9     2    70
#  3 170100  1982     9     3    15
#  4 170100  1982     9     4     0
#  5 170100  1982     9     5     0
#  6 170100  1982     9     6     0
#  7 170100  1982    10     1     0
#  8 170100  1982    10     2    10
#  9 170100  1982    10     3     0
# 10 170100  1982    10     4     0
# # ... with 26 more rows