数据框在这里
time value
0 01-01-2015 00:00 72
1 01-01-2015 01:00 74
2 01-01-2015 02:00 75
3 01-01-2015 03:00 77
4 01-01-2015 06:00 72
如果我在熊猫中传递此数据帧,它将给我24个条目,而丢失的小时数的输出(值)为zero
(这也是我想要的)
语法
resample_factor="H"
data_frame = data_frame.resample(resample_factor).mean()
first of all here are some link which was not helpful
我们可以用R ??吗?
如果可能的话,请建议我该怎么做!
答案 0 :(得分:1)
也许您正在寻找tidyr::complete
来完成缺少的时间。这会创建一个从first
时间值开始的24小时的每小时序列。
library(dplyr)
df %>%
mutate(V2 = as.POSIXct(V2, format = "%d-%m-%Y %H:%M")) %>%
arrange(V2) %>%
tidyr::complete(V2 = seq(first(V2), first(V2) + 86400 - (60 * 60),by = "1 hour"),
fill = list(V1 = 0, V3 = 0))
# V2 V1 V3
# <dttm> <dbl> <dbl>
# 1 2015-01-01 00:00:00 0 72
# 2 2015-01-01 01:00:00 1 74
# 3 2015-01-01 02:00:00 2 75
# 4 2015-01-01 03:00:00 3 77
# 5 2015-01-01 04:00:00 0 0
# 6 2015-01-01 05:00:00 0 0
# 7 2015-01-01 06:00:00 4 72
# 8 2015-01-01 07:00:00 0 0
# 9 2015-01-01 08:00:00 0 0
#10 2015-01-01 09:00:00 0 0
# … with 14 more rows
如果时间不是从00:00
开始,我们可以从日期时间中提取日期,并创建一个24小时的序列。
df %>%
mutate(V2 = as.POSIXct(V2, format = "%d-%m-%Y %H:%M", tz = "GMT")) %>%
tidyr::complete(V2 = seq(as.POSIXct(as.Date(first(V2))),by = "1 hour",
length.out = 24), fill = list(V1 = 0, V3 = 0))
数据
df <- structure(list(V1 = 0:4, V2 = structure(1:5, .Label = c("01-01-201500:00",
"01-01-201501:00", "01-01-201502:00", "01-01-201503:00", "01-01-201506:00"
), class = "factor"), V3 = c(72L, 74L, 75L, 77L, 72L)), class =
"data.frame", row.names = c(NA, -5L))
答案 1 :(得分:1)
这是基本的R主意,
dates1 <- seq(as.POSIXct(dd$V2[1], format = '%d-%m-%Y 00:00'),
as.POSIXct(dd$V2[1], format = '%d-%m-%Y 00:00') + 82800,
by = '1 hour')
merge(transform(dd, V2 = as.POSIXct(V2, format = '%d-%m-%Y %H:%M')),
data.frame(V2 = dates1),
by = 'V2', all = TRUE)
给出,
V2 V1 V3 1 2015-01-01 00:00:00 0 72 2 2015-01-01 01:00:00 1 74 3 2015-01-01 02:00:00 2 75 4 2015-01-01 03:00:00 3 77 5 2015-01-01 04:00:00 NA NA 6 2015-01-01 05:00:00 NA NA 7 2015-01-01 06:00:00 4 72 8 2015-01-01 07:00:00 NA NA 9 2015-01-01 08:00:00 NA NA 10 2015-01-01 09:00:00 NA NA 11 2015-01-01 10:00:00 NA NA 12 2015-01-01 11:00:00 NA NA 13 2015-01-01 12:00:00 NA NA 14 2015-01-01 13:00:00 NA NA 15 2015-01-01 14:00:00 NA NA 16 2015-01-01 15:00:00 NA NA 17 2015-01-01 16:00:00 NA NA 18 2015-01-01 17:00:00 NA NA 19 2015-01-01 18:00:00 NA NA 20 2015-01-01 19:00:00 NA NA 21 2015-01-01 20:00:00 NA NA 22 2015-01-01 21:00:00 NA NA 23 2015-01-01 22:00:00 NA NA 24 2015-01-01 23:00:00 NA NA
注意:您可以照常替换NA
数据
dput(dd)
structure(list(V1 = 0:4, V2 = c("01-01-2015 00:00", "01-01-2015 01:00",
"01-01-2015 02:00", "01-01-2015 03:00", "01-01-2015 06:00"),
V3 = c(72L, 74L, 75L, 77L, 72L)), row.names = c(NA, -5L), class = "data.frame")