复制行,但增加每行时间(+1分钟)

时间:2019-07-25 11:51:29

标签: r dplyr copy match mutate

我有一个很大的数据框(16819个观察结果),其中包含电视节目的日期和时间,持续时间和收视率。它看起来如下:

# Network    Date_Time          Dur_sec      Rating1   Rating2
1 Channel1   2013-01-01 18:02   300          0.0873    0.0184
2 Channel1   2013-01-01 18:10   2700         0.0621    0.0489
3 Channel1   2013-01-01 19:00   1500         0.0391    0.0558
5 Channel1   2013-01-01 19:29   1500         0.0128    0.0891
6 Channel1   2013-01-01 20:00   1260         0.0811    0.0182
7 Channel1   2013-01-01 20:30   4500         0.0481    0.0974

现在,我想复制每一行,但是只要程序运行,时间就会增加1分钟。程序1运行300秒(或5分钟),程序2运行2700秒(45分钟)。 18:07和18:10之间的差距是一个商业中断,应该忽略。结果应如下所示:

# Network    Date_Time          Dur_sec      Rating1   Rating2
1 Channel1   2013-01-01 18:02   300          0.0873    0.0184
2 Channel1   2013-01-01 18:03   300          0.0873    0.0184
3 Channel1   2013-01-01 18:04   300          0.0873    0.0184
5 Channel1   2013-01-01 18:05   300          0.0873    0.0184
6 Channel1   2013-01-01 18:06   300          0.0873    0.0184
7 Channel1   2013-01-01 18:07   300          0.0873    0.0184
8 Channel1   2013-01-01 18:10   2700         0.0621    0.0489
9 Channel1   2013-01-01 18:11   2700         0.0621    0.0489
10 Channel1  2013-01-01 18:12   2700         0.0621    0.0489
.
.
.
55 Channel1   2013-01-01 18:55   2700         0.0621    0.0489
56 Channel1   2013-01-01 19:00   1500         0.0391    0.0558

等等...

我该怎么做?最终目标是将该数据与也包含Date和Time变量的另一个数据集进行匹配。

3 个答案:

答案 0 :(得分:3)

您可以使用uncount()扩展数据框,然后使用id变量递增行:

library(dplyr)
library(tidyr)

df %>%
  mutate(Date_Time = as.POSIXct(Date_Time)) %>%
  uncount(weights = (Dur_sec %/% 60) + 1, .id = "cnt") %>%
  mutate(Date_Time = Date_Time + 60*(cnt-1))

     Network           Date_Time Dur_sec Rating1 Rating2 cnt
1   Channel1 2013-01-01 18:02:00     300  0.0873  0.0184   1
2   Channel1 2013-01-01 18:03:00     300  0.0873  0.0184   2
3   Channel1 2013-01-01 18:04:00     300  0.0873  0.0184   3
4   Channel1 2013-01-01 18:05:00     300  0.0873  0.0184   4
5   Channel1 2013-01-01 18:06:00     300  0.0873  0.0184   5
6   Channel1 2013-01-01 18:07:00     300  0.0873  0.0184   6
7   Channel1 2013-01-01 18:10:00    2700  0.0621  0.0489   1
8   Channel1 2013-01-01 18:11:00    2700  0.0621  0.0489   2
...

答案 1 :(得分:1)

您能告诉我这是否可行吗?

df$Date_Time <- as.date(df$Date_Time, format = "%-%m-%Y %H:%M", tz = "CET")

我首先要确保时间轴设置正确,然后创建分钟变量,更改新时间,如果在两段时间之间有间隔,则删除3分钟。

df <- df %>%
  mutate(Dur_min = Dur_sec/60) %>%
  mutate(new_date_time = Date_Time + mns(Dur_min)) %>%
  mutate(new_date_time = ifelse(Date_Time <= "2013-01-01 18:07" & new_date_time >= "2013-01-01 18:10", new_date_time - mns(3), new_date_time))

答案 2 :(得分:0)

这是使用complete

的另一种方法
library(dplyr)
library(tidyr)
df %>%
  mutate(Date_Time = as.POSIXct(Date_Time, format = "%Y-%m-%d %H:%M")) %>%
  group_by(row = row_number()) %>%
  complete(Date_Time = seq(Date_Time, by = "1 min", length.out = Dur_sec/60)) %>%
  ungroup() %>%
  select(-row) %>%
  fill(everything())

#   Date_Time           Network  Dur_sec Rating1 Rating2
#   <dttm>              <fct>      <int>   <dbl>   <dbl>
# 1 2013-01-01 18:02:00 Channel1     300  0.0873  0.0184
# 2 2013-01-01 18:03:00 Channel1     300  0.0873  0.0184
# 3 2013-01-01 18:04:00 Channel1     300  0.0873  0.0184
# 4 2013-01-01 18:05:00 Channel1     300  0.0873  0.0184
# 5 2013-01-01 18:06:00 Channel1     300  0.0873  0.0184
# 6 2013-01-01 18:10:00 Channel1    2700  0.0621  0.0489
# 7 2013-01-01 18:11:00 Channel1    2700  0.0621  0.0489
# 8 2013-01-01 18:12:00 Channel1    2700  0.0621  0.0489
# 9 2013-01-01 18:13:00 Channel1    2700  0.0621  0.0489
#10 2013-01-01 18:14:00 Channel1    2700  0.0621  0.0489