我有一个很大的数据框(16819个观察结果),其中包含电视节目的日期和时间,持续时间和收视率。它看起来如下:
# Network Date_Time Dur_sec Rating1 Rating2
1 Channel1 2013-01-01 18:02 300 0.0873 0.0184
2 Channel1 2013-01-01 18:10 2700 0.0621 0.0489
3 Channel1 2013-01-01 19:00 1500 0.0391 0.0558
5 Channel1 2013-01-01 19:29 1500 0.0128 0.0891
6 Channel1 2013-01-01 20:00 1260 0.0811 0.0182
7 Channel1 2013-01-01 20:30 4500 0.0481 0.0974
现在,我想复制每一行,但是只要程序运行,时间就会增加1分钟。程序1运行300秒(或5分钟),程序2运行2700秒(45分钟)。 18:07和18:10之间的差距是一个商业中断,应该忽略。结果应如下所示:
# Network Date_Time Dur_sec Rating1 Rating2
1 Channel1 2013-01-01 18:02 300 0.0873 0.0184
2 Channel1 2013-01-01 18:03 300 0.0873 0.0184
3 Channel1 2013-01-01 18:04 300 0.0873 0.0184
5 Channel1 2013-01-01 18:05 300 0.0873 0.0184
6 Channel1 2013-01-01 18:06 300 0.0873 0.0184
7 Channel1 2013-01-01 18:07 300 0.0873 0.0184
8 Channel1 2013-01-01 18:10 2700 0.0621 0.0489
9 Channel1 2013-01-01 18:11 2700 0.0621 0.0489
10 Channel1 2013-01-01 18:12 2700 0.0621 0.0489
.
.
.
55 Channel1 2013-01-01 18:55 2700 0.0621 0.0489
56 Channel1 2013-01-01 19:00 1500 0.0391 0.0558
等等...
我该怎么做?最终目标是将该数据与也包含Date和Time变量的另一个数据集进行匹配。
答案 0 :(得分:3)
您可以使用uncount()
扩展数据框,然后使用id变量递增行:
library(dplyr)
library(tidyr)
df %>%
mutate(Date_Time = as.POSIXct(Date_Time)) %>%
uncount(weights = (Dur_sec %/% 60) + 1, .id = "cnt") %>%
mutate(Date_Time = Date_Time + 60*(cnt-1))
Network Date_Time Dur_sec Rating1 Rating2 cnt
1 Channel1 2013-01-01 18:02:00 300 0.0873 0.0184 1
2 Channel1 2013-01-01 18:03:00 300 0.0873 0.0184 2
3 Channel1 2013-01-01 18:04:00 300 0.0873 0.0184 3
4 Channel1 2013-01-01 18:05:00 300 0.0873 0.0184 4
5 Channel1 2013-01-01 18:06:00 300 0.0873 0.0184 5
6 Channel1 2013-01-01 18:07:00 300 0.0873 0.0184 6
7 Channel1 2013-01-01 18:10:00 2700 0.0621 0.0489 1
8 Channel1 2013-01-01 18:11:00 2700 0.0621 0.0489 2
...
答案 1 :(得分:1)
您能告诉我这是否可行吗?
df$Date_Time <- as.date(df$Date_Time, format = "%-%m-%Y %H:%M", tz = "CET")
我首先要确保时间轴设置正确,然后创建分钟变量,更改新时间,如果在两段时间之间有间隔,则删除3分钟。
df <- df %>%
mutate(Dur_min = Dur_sec/60) %>%
mutate(new_date_time = Date_Time + mns(Dur_min)) %>%
mutate(new_date_time = ifelse(Date_Time <= "2013-01-01 18:07" & new_date_time >= "2013-01-01 18:10", new_date_time - mns(3), new_date_time))
答案 2 :(得分:0)
这是使用complete
library(dplyr)
library(tidyr)
df %>%
mutate(Date_Time = as.POSIXct(Date_Time, format = "%Y-%m-%d %H:%M")) %>%
group_by(row = row_number()) %>%
complete(Date_Time = seq(Date_Time, by = "1 min", length.out = Dur_sec/60)) %>%
ungroup() %>%
select(-row) %>%
fill(everything())
# Date_Time Network Dur_sec Rating1 Rating2
# <dttm> <fct> <int> <dbl> <dbl>
# 1 2013-01-01 18:02:00 Channel1 300 0.0873 0.0184
# 2 2013-01-01 18:03:00 Channel1 300 0.0873 0.0184
# 3 2013-01-01 18:04:00 Channel1 300 0.0873 0.0184
# 4 2013-01-01 18:05:00 Channel1 300 0.0873 0.0184
# 5 2013-01-01 18:06:00 Channel1 300 0.0873 0.0184
# 6 2013-01-01 18:10:00 Channel1 2700 0.0621 0.0489
# 7 2013-01-01 18:11:00 Channel1 2700 0.0621 0.0489
# 8 2013-01-01 18:12:00 Channel1 2700 0.0621 0.0489
# 9 2013-01-01 18:13:00 Channel1 2700 0.0621 0.0489
#10 2013-01-01 18:14:00 Channel1 2700 0.0621 0.0489