以小时为单位区分第一个和最后一个实例

时间:2017-03-01 13:31:38

标签: r datetime dplyr

如何获得小时记录的最后价格与第一个价格之间的差异?使用dplyr会很好。

请参阅下面的dput数据集:

structure(list(DATETIME = structure(1:20, .Label = c("2007-05-30 09:41:00", 
"2007-05-30 09:45:00", "2007-05-30 10:22:00", "2007-05-30 10:37:00", 
"2007-05-30 10:39:00", "2007-05-30 11:25:00", "2007-05-30 13:21:00", 
"2007-05-30 14:01:00", "2007-05-31 09:38:00", "2007-05-31 09:56:00", 
"2007-05-31 11:02:00", "2007-05-31 11:09:00", "2007-05-31 11:56:00", 
"2007-05-31 11:57:00", "2007-05-31 13:42:00", "2007-05-31 14:12:00", 
"2007-05-31 14:25:00", "2007-05-31 15:39:00", "2007-05-31 15:48:00", 
"2007-05-31 15:55:00"), class = "factor"), MINUTE = c(41L, 45L, 
22L, 37L, 39L, 25L, 21L, 1L, 38L, 56L, 2L, 9L, 56L, 57L, 42L, 
12L, 25L, 39L, 48L, 55L), HOUR = c(9L, 9L, 10L, 10L, 10L, 11L, 
13L, 14L, 9L, 9L, 11L, 11L, 11L, 11L, 13L, 14L, 14L, 15L, 15L, 
15L), DAY = c(30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 31L, 31L, 
31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L), MONTH = c(5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L), YEAR = c(2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 
2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 
2007L, 2007L, 2007L, 2007L, 2007L), AV.PRICE.BIL = c(45.79, 45.75, 
45.79, 45.79, 45.79, 45.79, 45.8, 45.8, 45.79, 45.8, 45.8, 45.8, 
45.8, 45.8, 45.8, 45.8, 45.8, 45.8, 45.8, 45.8)), class = "data.frame", row.names = c(NA, 
-20L), .Names = c("DATETIME", "MINUTE", "HOUR", "DAY", "MONTH", 
"YEAR", "AV.PRICE.BIL"))

需要样本输出:

DATETIME             MINUTE     HOUR     DAY     MONTH     YEAR     AV.PRICE.BIL   HOURLY.DIFF 
2007-05-30 09:41:00  41         9        30      5         2007     45.79           0          
2007-05-30 10:22:00  22         10       30      5         2007     45.79           0
2007-05-30 11:25:00  25         11       30      5         2007     45.79           0
2007-05-30 13:21:00  21         13       30      5         2007     45.79           0   

因此,如果有任何缺失时间,它只是从当前小时的最后记录小时中减去观察值。

1 个答案:

答案 0 :(得分:2)

firstlast函数使得这相当简单。

mutateslice,而不是summarise,因为您似乎想要保留DATETIMEMINUTE等的第一个实例。

df %>% 
  group_by(YEAR, MONTH, DAY, HOUR) %>% 
  arrange(MINUTE) %>% 
  mutate(HOURLY.DIFF = last(AV.PRICE.BIL) - first(AV.PRICE.BIL)) %>% 
  slice(1)
Source: local data frame [10 x 8]
Groups: YEAR, MONTH, DAY, HOUR [10]

              DATETIME MINUTE  HOUR   DAY MONTH  YEAR AV.PRICE.BIL HOURLY.DIFF
                <fctr>  <int> <int> <int> <int> <int>        <dbl>       <dbl>
1  2007-05-30 09:41:00     41     9    30     5  2007        45.79       -0.04
2  2007-05-30 10:22:00     22    10    30     5  2007        45.79        0.00
3  2007-05-30 11:25:00     25    11    30     5  2007        45.79        0.00
4  2007-05-30 13:21:00     21    13    30     5  2007        45.80        0.00
5  2007-05-30 14:01:00      1    14    30     5  2007        45.80        0.00
6  2007-05-31 09:38:00     38     9    31     5  2007        45.79        0.01
7  2007-05-31 11:02:00      2    11    31     5  2007        45.80        0.00
8  2007-05-31 13:42:00     42    13    31     5  2007        45.80        0.00
9  2007-05-31 14:12:00     12    14    31     5  2007        45.80        0.00
10 2007-05-31 15:39:00     39    15    31     5  2007        45.80        0.00