数据集包含全年通信中断事件。在事件开始和结束期间,数据中的行有一个中断。数据如下所示
comms.break<- data.frame(line = c("line1","line2","line3","line1"),eventstart = c("1/1/2017 7:24","1/2/2017 8:25","1/1/2017 7:24","1/2/2017 9:25"),eventend = c("1/1/2017 7:25","1/2/2017 8:26","1/1/2017 7:29","1/2/2017 10:25"))
> comms.break
line eventstart eventend
1 line1 1/1/2017 7:24 1/1/2017 7:25
2 line2 1/2/2017 8:25 1/2/2017 8:26
3 line3 1/1/2017 7:24 1/1/2017 7:29
4 line1 1/2/2017 9:25 1/2/2017 10:25
我需要一个新的数据框。第一列将是全年1分钟的时间序列,其他列将是行号。如果不在上面的事件中,每个时间戳的每一行的值将为1,否则将为0.最终输出将是这样的
Time line1 line2 line3
1/1/2017 0:00 1 1 1
1/1/2017 0:01 1 1 1
...............
1/1/2017 7:24 0 1 0
1/1/2017 7:25 0 1 0
1/1/2017 7:26 1 1 0
...............
如何获得以上结果的想法并不多。在此先感谢您的帮助!
在帮助下,以下代码可以完成工作!
library(dplyr)
library(tidyr)
library(lubridate)
comms.break<- data.frame(line = c("line1","line2","line3","line1"),eventstart = c("1/1/2017 7:24","1/2/2017 8:25","1/1/2017 7:24","1/2/2017 9:25"),eventend = c("1/1/2017 7:25","1/2/2017 8:26","1/1/2017 7:29","1/2/2017 10:25"))
# put the data into date objects
events <- comms.break %>%
mutate_at(vars(eventstart, eventend),
~ as.POSIXct(strptime(., format = "%m/%d/%Y %H:%M"))) %>%
# now expand it
rowwise %>%
mutate(Time = seq(eventstart, eventend, by = "min") %>% list) %>%
unnest(Time) %>%
select(line, Time)
# make a whole year time series
year.start<- as.POSIXct(strptime(as.character("1/1/2017 0:00"), "%m/%d/%Y %H:%M"))
year.end<- as.POSIXct(strptime(as.character("12/31/2017 23:59"), "%m/%d/%Y %H:%M"))
# make the minute vector and join in the events
time_series <- seq(year.start, year.end, by = "min") %>%
data_frame(Time = .) %>%
left_join(events, by = "Time") %>%
mutate(counter = 1) %>%
spread(line, counter, fill = 0) %>%
select(-`<NA>`)
结果是:
Time line1 line2 line3
* <dttm> <dbl> <dbl> <dbl>
1 2017-01-01 00:00:00 0 0 0
2 2017-01-01 00:01:00 0 0 0
3 2017-01-01 00:02:00 0 0 0
4 2017-01-01 00:03:00 0 0 0
.............
445 2017-01-01 07:24:00 1 0 1
446 2017-01-01 07:25:00 1 0 1
447 2017-01-01 07:26:00 0 0 1
448 2017-01-01 07:27:00 0 0 1
449 2017-01-01 07:28:00 0 0 1
450 2017-01-01 07:29:00 0 0 1
451 2017-01-01 07:30:00 0 0 0
答案 0 :(得分:0)
以下是我认为您正在寻找的内容。这是一种tidyverse
方法,它使用%within%
运算符和interval
中的lubridate
对象类型来检查每一分钟是否位于事件中,以及gather
组合{ {1}},complete
和spread
构建一个具有所需分钟范围的数据框。请注意,这里我们只有2017-01-01 07:24:00
到2017-01-02 10:25:00
的分数,因为这是示例日期时间的全部范围。除非有相应的许多活动期,否则全年的会议记录会更大,并且在使用此方法时无论您的目标是什么都不合理。
此代码目前对大量事件也不是很容易扩展。我认为使用适当的辅助函数巧妙地使用mutate_at
应该能够正确地创建line
列,并避免使用单独的line_intv
对象,但我无法得到它工作并使用bind_cols
和map_lgl
代替,生成整个列,然后将其粘贴到tibble上。改进赞赏!
library(tidyverse)
library(lubridate)
library(magrittr)
comms.break<- tibble(line = c("line1","line2","line3","line1"),eventstart = c("1/1/2017 7:24","1/2/2017 8:25","1/1/2017 7:24","1/2/2017 9:25"),eventend = c("1/1/2017 7:25","1/2/2017 8:26","1/1/2017 7:29","1/2/2017 10:25"))
line1_intv <- comms.break %>%
filter(line == "line1") %>%
mutate(interval = interval(dmy_hm(eventstart), dmy_hm(eventend))) %>%
extract2("interval")
line2_intv <- comms.break %>%
filter(line == "line2") %>%
mutate(interval = interval(dmy_hm(eventstart), dmy_hm(eventend))) %>%
extract2("interval")
line3_intv <- comms.break %>%
filter(line == "line3") %>%
mutate(interval = interval(dmy_hm(eventstart), dmy_hm(eventend))) %>%
extract2("interval")
minutes <- comms.break %>%
gather(key = "event", value = "datetime", eventstart, eventend) %>%
mutate(datetime = dmy_hm(datetime)) %>%
arrange(datetime) %>%
complete(datetime = full_seq(datetime, 60)) %>%
spread(key = "line", value = "event") %>%
select(datetime)
output <- minutes %>%
bind_cols(line1 = map_lgl(minutes$datetime, ~ any(. %within% line1_intv))) %>%
bind_cols(line2 = map_lgl(minutes$datetime, ~ any(. %within% line3_intv))) %>%
bind_cols(line3 = map_lgl(minutes$datetime, ~ any(. %within% line3_intv))) %>%
mutate_at(vars(line1:line3), as.integer)
print(output)
# A tibble: 44,822 x 4
datetime line1 line2 line3
<dttm> <int> <int> <int>
1 2017-01-01 07:24:00 1 0 1
2 2017-01-01 07:25:00 1 0 1
3 2017-01-01 07:26:00 0 0 1
4 2017-01-01 07:27:00 0 0 1
5 2017-01-01 07:28:00 0 0 1
6 2017-01-01 07:29:00 0 0 1
7 2017-01-01 07:30:00 0 0 0
8 2017-01-01 07:31:00 0 0 0
9 2017-01-01 07:32:00 0 0 0
10 2017-01-01 07:33:00 0 0 0
# ... with 44,812 more rows
答案 1 :(得分:0)
这有点棘手,但这种方法有效。注意我使用了几个tidyverse
库。日期lubridate
标准,以及dplyr
和tidyr
,非常适合数据操作。如果您遇到包裹问题,请使用library(tidyverse)
。
library(dplyr)
library(tidyr)
library(lubridate)
Sys.setenv(TZ='GMT') # set your timezone
# put the data into date objects
events <- comms.break %>%
mutate_at(vars(eventstart, eventend),
~ as.POSIXct(strptime(., format = "%m/%d/%Y %H:%M"))) %>%
# now expand it
rowwise %>%
mutate(Time = seq(eventstart, eventend, by = "min") %>% list) %>%
unnest(Time) %>%
select(line, Time)
# get the time range to make the minute table from
time_range <- events %>%
pull(Time) %>%
range
# make the minute vector and join in the events
time_series <- seq(time_range[1], time_range[2], by = "min") %>%
data_frame(Time = .) %>%
left_join(events, by = "Time") %>%
mutate(counter = 1) %>%
spread(line, counter, fill = 0) %>%
select(-`<NA>`)