拆分,转置和收集数据框

时间:2017-04-24 22:12:41

标签: r dataframe dplyr transpose

我需要在数据框中收集和转置数据,每小时值应该在一列中。第一列应该是具有小时和第二个转置小时值的日期。

数据样本:

    structure(list(Year = c(2016L, 2016L), JDay = 1:2, Hour_1 = c(2.59, 
5.95), Hour_2 = c(2.19, 5.84), Hour_3 = c(1.84, 5.75), Hour_4 = c(1.51, 
5.66), Hour_5 = c(1.21, 5.58), Hour_6 = c(0.94, 5.5), Hour_7 = c(0.69, 
5.43), Hour_8 = c(0.45, 5.37), Hour_9 = c(0.23, 5.31), Hour_10 = c(2.18, 
6.19), Hour_11 = c(4.39, 7.16), Hour_12 = c(6.29, 8), Hour_13 = c(7.76, 
8.65), Hour_14 = c(8.68, 9.06), Hour_15 = c(9, 9.2), Hour_16 = c(8.68, 
9.06), Hour_17 = c(7.76, 8.65), Hour_18 = c(7.8, 8.52), Hour_19 = c(7.21, 
7.57), Hour_20 = c(6.85, 6.99), Hour_21 = c(6.59, 6.57), Hour_22 = c(6.39, 
6.25), Hour_23 = c(6.22, 5.98), Hour_24 = c(6.08, 5.75)), .Names = c("Year", 
"JDay", "Hour_1", "Hour_2", "Hour_3", "Hour_4", "Hour_5", "Hour_6", 
"Hour_7", "Hour_8", "Hour_9", "Hour_10", "Hour_11", "Hour_12", 
"Hour_13", "Hour_14", "Hour_15", "Hour_16", "Hour_17", "Hour_18", 
"Hour_19", "Hour_20", "Hour_21", "Hour_22", "Hour_23", "Hour_24"
), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"

使用聚合只是按顺序给我所有Hour_1值......

gather(OP_daily[, c(5:28)], time,temp, Hour_1:Hour_24)

示例输出:

date           temp    
2016-1-1 1:00  2.59
2016-1-1 2:00  2.19

1 个答案:

答案 0 :(得分:2)

这听起来像你正在寻找的gather

df %>%
  gather(-c(Year,JDay), key = "Hour", value = "temp") %>%
  unite(date,Year,JDay,Hour) %>%
  mutate(date=as.POSIXct(date,format='%Y_%j_Hour_%H'))  %>%
  arrange(date)

                  date     temp
                <time>    <dbl>
1  2016-01-01 01:00:00 2.592221
2  2016-01-01 02:00:00 2.193009
3  2016-01-01 03:00:00 1.835225
4  2016-01-01 04:00:00 1.511071
5  2016-01-01 05:00:00 1.214767
6  2016-01-01 06:00:00 0.941902

修改

要查看每天观察的次数:

res <- df %>%
  gather(-c(Year,JDay), key = "Hour", value = "temp") %>%
  unite(date,Year,JDay,Hour) %>%
  mutate(date=as.POSIXct(date,format='%Y_%j_Hour_%H',tz = "GMT"))  %>%
  arrange(date)
res%>%
mutate(date_only=as.Date(date))%>%
group_by(date_only)%>%
summarise(count=n())

   date_only count
      <date> <int>
1 2016-01-01    23
2 2016-01-02    24
3 2016-01-03     1