复制行,以便所有列保持相同,但一列顺序变大

时间:2019-01-07 06:55:08

标签: r

在R中工作当前表如下:

C1    C2                          C3
1     2011-02-01 04:30:00         4
2     2011-02-01 04:45:00         3
3     2011-02-01 05:00:00         5
4     2011-02-01 05:15:00         6

我希望它看起来像这样:

C1    C2                          C3       C4
1     2011-02-01 04:30:00         4        2011-02-01 04:30:00
2     2011-02-01 04:30:00         4        2011-02-01 04:35:00
3     2011-02-01 04:30:00         4        2011-02-01 04:40:00
4     2011-02-01 04:45:00         3        2011-02-01 04:45:00
5     2011-02-01 04:45:00         3        2011-02-01 04:50:00 
6     2011-02-01 04:45:00         3        2011-02-01 04:55:00
7     2011-02-01 05:00:00         5        2011-02-01 05:00:00
8     2011-02-01 05:00:00         5        2011-02-01 05:05:00

等等等,基本上只是想制作另一列,每隔五分钟上升一次,但与C2中的间隔匹配。 我在想像rep()函数之类的东西,但这将意味着C2中的间隔始终是一致的,而可能不一致。我真的在寻找可以根据C2中的间隔执行五分钟间隔的事情。

对于这个问题的任何帮助或反馈,我们将不胜感激。谢谢

4 个答案:

答案 0 :(得分:1)

我们可以使用map2来创建list列,方法是将seq转换为'C2'的Datetime项用相应元素指定的length “ C3”(by 5分钟间隔和unnest list

library(tidyverse)
df1 %>% 
  mutate(C4 = map2(lubridate::ymd_hms(C2), C3, ~ seq(.x, length.out = .y, by = '5 min'))) %>% 
  unnest
#  C1                  C2 C3                  C4
#1   1 2011-02-01 04:30:00  4 2011-02-01 04:30:00
#2   1 2011-02-01 04:30:00  4 2011-02-01 04:35:00
#3   1 2011-02-01 04:30:00  4 2011-02-01 04:40:00
#4   1 2011-02-01 04:30:00  4 2011-02-01 04:45:00
#5   2 2011-02-01 04:45:00  3 2011-02-01 04:45:00
#6   2 2011-02-01 04:45:00  3 2011-02-01 04:50:00
#7   2 2011-02-01 04:45:00  3 2011-02-01 04:55:00
#8   3 2011-02-01 05:00:00  5 2011-02-01 05:00:00
#9   3 2011-02-01 05:00:00  5 2011-02-01 05:05:00
#10  3 2011-02-01 05:00:00  5 2011-02-01 05:10:00
#11  3 2011-02-01 05:00:00  5 2011-02-01 05:15:00
#12  3 2011-02-01 05:00:00  5 2011-02-01 05:20:00
#13  4 2011-02-01 05:15:00  6 2011-02-01 05:15:00
#14  4 2011-02-01 05:15:00  6 2011-02-01 05:20:00
#15  4 2011-02-01 05:15:00  6 2011-02-01 05:25:00
#16  4 2011-02-01 05:15:00  6 2011-02-01 05:30:00
#17  4 2011-02-01 05:15:00  6 2011-02-01 05:35:00
#18  4 2011-02-01 05:15:00  6 2011-02-01 05:40:00

或者使用Map中的base R,以与上述相同的逻辑获得list的DateTime序列。通过{l} 1的rep对齐行序列来展开原始数据集,并创建新列“ C4”

lengths

如果条件基于下一个“ C2”值

lst1 <- Map(function(x, y) seq(x, length.out = y, by = '5 min'),
    as.POSIXct(df1$C2), df1$C3)
df2 <- df1[rep(seq_len(nrow(df1)), lengths(lst1)),]
df2$C4 <- do.call(c, lst1)
row.names(df2) <- NULL

或使用df1 %>% mutate(C4 = map2(ymd_hms(C2), lubridate::ymd_hms(lead(C2, default = last(C2))), seq, by = '5 min')) %>% unnest %>% group_by(C1) %>% slice(-1) # A tibble: 9 x 4 # Groups: C1 [3] # C1 C2 C3 C4 # <int> <chr> <int> <dttm> #1 1 2011-02-01 04:30:00 4 2011-02-01 04:35:00 #2 1 2011-02-01 04:30:00 4 2011-02-01 04:40:00 #3 1 2011-02-01 04:30:00 4 2011-02-01 04:45:00 #4 2 2011-02-01 04:45:00 3 2011-02-01 04:50:00 #5 2 2011-02-01 04:45:00 3 2011-02-01 04:55:00 #6 2 2011-02-01 04:45:00 3 2011-02-01 05:00:00 #7 3 2011-02-01 05:00:00 5 2011-02-01 05:05:00 #8 3 2011-02-01 05:00:00 5 2011-02-01 05:10:00 #9 3 2011-02-01 05:00:00 5 2011-02-01 05:15:00

中的方法的类似选项
data.table

数据

library(data.table)
setDT(df1)[, C2 := as.POSIXct(C2)][, C4 := list(Map(seq, 
   MoreArgs = list(by = '5 min'), C2, shift(C2, type = 'lead',
      fill = last(C2))))][, unnest(.SD)][, .SD[-1], by = C1]

答案 1 :(得分:1)

使用tidyverse的另一个complete选项,

library(tidyverse)

df %>% 
 mutate(C2 = as.POSIXct(C2, format = '%Y-%m-%d %H:%M:%S'), C4 = C2) %>% 
 complete(C4 = seq(min(C2), max(C2), by = '5 min')) %>% 
 fill(C1, C2, C3)

给出,

# A tibble: 10 x 4
   C4                  C1    C2                     C3
   <dttm>              <chr> <dttm>              <int>
 1 2011-02-01 04:30:00 1     2011-02-01 04:30:00     4
 2 2011-02-01 04:35:00 1     2011-02-01 04:30:00     4
 3 2011-02-01 04:40:00 1     2011-02-01 04:30:00     4
 4 2011-02-01 04:45:00 2     2011-02-01 04:45:00     3
 5 2011-02-01 04:50:00 2     2011-02-01 04:45:00     3
 6 2011-02-01 04:55:00 2     2011-02-01 04:45:00     3
 7 2011-02-01 05:00:00 3     2011-02-01 05:00:00     5
 8 2011-02-01 05:05:00 3     2011-02-01 05:00:00     5
 9 2011-02-01 05:10:00 3     2011-02-01 05:00:00     5
10 2011-02-01 05:15:00 4     2011-02-01 05:15:00     6

答案 2 :(得分:0)

我们可以在min的{​​{1}}和max值之间创建5分钟间隔的序列,然后对C2进行left_join并填充缺失的值使用动物园中的df使用先前的值。

na.locf

答案 3 :(得分:0)

library(lubridate)

您可以使用此库包。使用apply()或直接将df [C2]转换为日期时间。 转换后,请使用

df[C4] <- ymd_hms(df[C2]) + min(5)df[C4] <- ymd_hms(df[C2]) + seconds(300)