在R中工作当前表如下:
C1 C2 C3
1 2011-02-01 04:30:00 4
2 2011-02-01 04:45:00 3
3 2011-02-01 05:00:00 5
4 2011-02-01 05:15:00 6
我希望它看起来像这样:
C1 C2 C3 C4
1 2011-02-01 04:30:00 4 2011-02-01 04:30:00
2 2011-02-01 04:30:00 4 2011-02-01 04:35:00
3 2011-02-01 04:30:00 4 2011-02-01 04:40:00
4 2011-02-01 04:45:00 3 2011-02-01 04:45:00
5 2011-02-01 04:45:00 3 2011-02-01 04:50:00
6 2011-02-01 04:45:00 3 2011-02-01 04:55:00
7 2011-02-01 05:00:00 5 2011-02-01 05:00:00
8 2011-02-01 05:00:00 5 2011-02-01 05:05:00
等等等,基本上只是想制作另一列,每隔五分钟上升一次,但与C2中的间隔匹配。 我在想像rep()函数之类的东西,但这将意味着C2中的间隔始终是一致的,而可能不一致。我真的在寻找可以根据C2中的间隔执行五分钟间隔的事情。
对于这个问题的任何帮助或反馈,我们将不胜感激。谢谢
答案 0 :(得分:1)
我们可以使用map2
来创建list
列,方法是将seq
转换为'C2'的Datetime
项用相应元素指定的length
“ C3”(by
5分钟间隔和unnest
list
列
library(tidyverse)
df1 %>%
mutate(C4 = map2(lubridate::ymd_hms(C2), C3, ~ seq(.x, length.out = .y, by = '5 min'))) %>%
unnest
# C1 C2 C3 C4
#1 1 2011-02-01 04:30:00 4 2011-02-01 04:30:00
#2 1 2011-02-01 04:30:00 4 2011-02-01 04:35:00
#3 1 2011-02-01 04:30:00 4 2011-02-01 04:40:00
#4 1 2011-02-01 04:30:00 4 2011-02-01 04:45:00
#5 2 2011-02-01 04:45:00 3 2011-02-01 04:45:00
#6 2 2011-02-01 04:45:00 3 2011-02-01 04:50:00
#7 2 2011-02-01 04:45:00 3 2011-02-01 04:55:00
#8 3 2011-02-01 05:00:00 5 2011-02-01 05:00:00
#9 3 2011-02-01 05:00:00 5 2011-02-01 05:05:00
#10 3 2011-02-01 05:00:00 5 2011-02-01 05:10:00
#11 3 2011-02-01 05:00:00 5 2011-02-01 05:15:00
#12 3 2011-02-01 05:00:00 5 2011-02-01 05:20:00
#13 4 2011-02-01 05:15:00 6 2011-02-01 05:15:00
#14 4 2011-02-01 05:15:00 6 2011-02-01 05:20:00
#15 4 2011-02-01 05:15:00 6 2011-02-01 05:25:00
#16 4 2011-02-01 05:15:00 6 2011-02-01 05:30:00
#17 4 2011-02-01 05:15:00 6 2011-02-01 05:35:00
#18 4 2011-02-01 05:15:00 6 2011-02-01 05:40:00
或者使用Map
中的base R
,以与上述相同的逻辑获得list
的DateTime序列。通过{l} 1的rep
对齐行序列来展开原始数据集,并创建新列“ C4”
lengths
如果条件基于下一个“ C2”值
lst1 <- Map(function(x, y) seq(x, length.out = y, by = '5 min'),
as.POSIXct(df1$C2), df1$C3)
df2 <- df1[rep(seq_len(nrow(df1)), lengths(lst1)),]
df2$C4 <- do.call(c, lst1)
row.names(df2) <- NULL
或使用df1 %>%
mutate(C4 = map2(ymd_hms(C2), lubridate::ymd_hms(lead(C2, default = last(C2))),
seq, by = '5 min')) %>%
unnest %>%
group_by(C1) %>%
slice(-1)
# A tibble: 9 x 4
# Groups: C1 [3]
# C1 C2 C3 C4
# <int> <chr> <int> <dttm>
#1 1 2011-02-01 04:30:00 4 2011-02-01 04:35:00
#2 1 2011-02-01 04:30:00 4 2011-02-01 04:40:00
#3 1 2011-02-01 04:30:00 4 2011-02-01 04:45:00
#4 2 2011-02-01 04:45:00 3 2011-02-01 04:50:00
#5 2 2011-02-01 04:45:00 3 2011-02-01 04:55:00
#6 2 2011-02-01 04:45:00 3 2011-02-01 05:00:00
#7 3 2011-02-01 05:00:00 5 2011-02-01 05:05:00
#8 3 2011-02-01 05:00:00 5 2011-02-01 05:10:00
#9 3 2011-02-01 05:00:00 5 2011-02-01 05:15:00
data.table
library(data.table)
setDT(df1)[, C2 := as.POSIXct(C2)][, C4 := list(Map(seq,
MoreArgs = list(by = '5 min'), C2, shift(C2, type = 'lead',
fill = last(C2))))][, unnest(.SD)][, .SD[-1], by = C1]
答案 1 :(得分:1)
使用tidyverse
的另一个complete
选项,
library(tidyverse)
df %>%
mutate(C2 = as.POSIXct(C2, format = '%Y-%m-%d %H:%M:%S'), C4 = C2) %>%
complete(C4 = seq(min(C2), max(C2), by = '5 min')) %>%
fill(C1, C2, C3)
给出,
# A tibble: 10 x 4 C4 C1 C2 C3 <dttm> <chr> <dttm> <int> 1 2011-02-01 04:30:00 1 2011-02-01 04:30:00 4 2 2011-02-01 04:35:00 1 2011-02-01 04:30:00 4 3 2011-02-01 04:40:00 1 2011-02-01 04:30:00 4 4 2011-02-01 04:45:00 2 2011-02-01 04:45:00 3 5 2011-02-01 04:50:00 2 2011-02-01 04:45:00 3 6 2011-02-01 04:55:00 2 2011-02-01 04:45:00 3 7 2011-02-01 05:00:00 3 2011-02-01 05:00:00 5 8 2011-02-01 05:05:00 3 2011-02-01 05:00:00 5 9 2011-02-01 05:10:00 3 2011-02-01 05:00:00 5 10 2011-02-01 05:15:00 4 2011-02-01 05:15:00 6
答案 2 :(得分:0)
我们可以在min
的{{1}}和max
值之间创建5分钟间隔的序列,然后对C2
进行left_join
并填充缺失的值使用动物园中的df
使用先前的值。
na.locf
答案 3 :(得分:0)
library(lubridate)
您可以使用此库包。使用apply()
或直接将df [C2]转换为日期时间。
转换后,请使用
df[C4] <- ymd_hms(df[C2]) + min(5)
或df[C4] <- ymd_hms(df[C2]) + seconds(300)