r时间序列匹配

时间:2018-02-06 23:25:21

标签: r

数据集包含全年通信中断事件。在事件开始和结束期间,数据中的行有一个中断。数据如下所示

comms.break<- data.frame(line = c("line1","line2","line3","line1"),eventstart = c("1/1/2017 7:24","1/2/2017 8:25","1/1/2017 7:24","1/2/2017 9:25"),eventend = c("1/1/2017 7:25","1/2/2017 8:26","1/1/2017 7:29","1/2/2017 10:25"))

> comms.break
   line    eventstart       eventend
1 line1 1/1/2017 7:24  1/1/2017 7:25
2 line2 1/2/2017 8:25  1/2/2017 8:26
3 line3 1/1/2017 7:24  1/1/2017 7:29
4 line1 1/2/2017 9:25 1/2/2017 10:25

我需要一个新的数据框。第一列将是全年1分钟的时间序列,其他列将是行号。如果不在上面的事件中,每个时间戳的每一行的值将为1,否则将为0.最终输出将是这样的

    Time               line1  line2  line3 
   1/1/2017 0:00       1       1      1 
   1/1/2017 0:01       1       1      1
    ...............
   1/1/2017  7:24      0       1      0
   1/1/2017  7:25      0       1      0
   1/1/2017  7:26      1       1      0
   ...............

如何获得以上结果的想法并不多。在此先感谢您的帮助!

在帮助下,以下代码可以完成工作!

library(dplyr)
library(tidyr)
library(lubridate)

comms.break<- data.frame(line = c("line1","line2","line3","line1"),eventstart = c("1/1/2017 7:24","1/2/2017 8:25","1/1/2017 7:24","1/2/2017 9:25"),eventend = c("1/1/2017 7:25","1/2/2017 8:26","1/1/2017 7:29","1/2/2017 10:25"))

# put the data into date objects
events <- comms.break %>% 
  mutate_at(vars(eventstart, eventend), 
            ~ as.POSIXct(strptime(., format = "%m/%d/%Y %H:%M"))) %>%
  # now expand it
  rowwise %>% 
  mutate(Time = seq(eventstart, eventend, by = "min") %>% list) %>% 
  unnest(Time) %>% 
  select(line, Time)

# make a whole year time series
year.start<- as.POSIXct(strptime(as.character("1/1/2017 0:00"), "%m/%d/%Y %H:%M"))
year.end<- as.POSIXct(strptime(as.character("12/31/2017 23:59"), "%m/%d/%Y %H:%M"))


# make the minute vector and join in the events
time_series <- seq(year.start, year.end, by = "min") %>% 
data_frame(Time = .) %>% 
left_join(events, by = "Time") %>%
mutate(counter = 1) %>% 
spread(line, counter, fill = 0) %>% 
select(-`<NA>`)

结果是:

Time line1 line2 line3
  *              <dttm> <dbl> <dbl> <dbl>
  1 2017-01-01 00:00:00     0     0     0
  2 2017-01-01 00:01:00     0     0     0
  3 2017-01-01 00:02:00     0     0     0
  4 2017-01-01 00:03:00     0     0     0
.............
445 2017-01-01 07:24:00     1     0     1
446 2017-01-01 07:25:00     1     0     1
447 2017-01-01 07:26:00     0     0     1
448 2017-01-01 07:27:00     0     0     1
449 2017-01-01 07:28:00     0     0     1
450 2017-01-01 07:29:00     0     0     1
451 2017-01-01 07:30:00     0     0     0

2 个答案:

答案 0 :(得分:0)

以下是我认为您正在寻找的内容。这是一种tidyverse方法,它使用%within%运算符和interval中的lubridate对象类型来检查每一分钟是否位于事件中,以及gather组合{ {1}},completespread构建一个具有所需分钟范围的数据框。请注意,这里我们只有2017-01-01 07:24:002017-01-02 10:25:00的分数,因为这是示例日期时间的全部范围。除非有相应的许多活动期,否则全年的会议记录会更大,并且在使用此方法时无论您的目标是什么都不合理。

此代码目前对大量事件也不是很容易扩展。我认为使用适当的辅助函数巧妙地使用mutate_at应该能够正确地创建line列,并避免使用单独的line_intv对象,但我无法得到它工作并使用bind_colsmap_lgl代替,生成整个列,然后将其粘贴到tibble上。改进赞赏!

library(tidyverse)
library(lubridate)
library(magrittr)

comms.break<- tibble(line = c("line1","line2","line3","line1"),eventstart = c("1/1/2017 7:24","1/2/2017 8:25","1/1/2017 7:24","1/2/2017 9:25"),eventend = c("1/1/2017 7:25","1/2/2017 8:26","1/1/2017 7:29","1/2/2017 10:25"))

line1_intv <- comms.break %>%
  filter(line == "line1") %>%
  mutate(interval = interval(dmy_hm(eventstart), dmy_hm(eventend))) %>%
  extract2("interval")

line2_intv <- comms.break %>%
  filter(line == "line2") %>%
  mutate(interval = interval(dmy_hm(eventstart), dmy_hm(eventend))) %>%
  extract2("interval")

line3_intv <- comms.break %>%
  filter(line == "line3") %>%
  mutate(interval = interval(dmy_hm(eventstart), dmy_hm(eventend))) %>%
  extract2("interval")

minutes <- comms.break %>%
  gather(key = "event", value = "datetime", eventstart, eventend) %>%
  mutate(datetime = dmy_hm(datetime)) %>%
  arrange(datetime) %>%
  complete(datetime = full_seq(datetime, 60)) %>%
  spread(key = "line", value = "event") %>%
  select(datetime)

output <- minutes %>%
  bind_cols(line1 = map_lgl(minutes$datetime, ~ any(. %within% line1_intv))) %>%
  bind_cols(line2 = map_lgl(minutes$datetime, ~ any(. %within% line3_intv))) %>%
  bind_cols(line3 = map_lgl(minutes$datetime, ~ any(. %within% line3_intv))) %>%
  mutate_at(vars(line1:line3), as.integer)
print(output)

# A tibble: 44,822 x 4
   datetime            line1 line2 line3
   <dttm>              <int> <int> <int>
 1 2017-01-01 07:24:00     1     0     1
 2 2017-01-01 07:25:00     1     0     1
 3 2017-01-01 07:26:00     0     0     1
 4 2017-01-01 07:27:00     0     0     1
 5 2017-01-01 07:28:00     0     0     1
 6 2017-01-01 07:29:00     0     0     1
 7 2017-01-01 07:30:00     0     0     0
 8 2017-01-01 07:31:00     0     0     0
 9 2017-01-01 07:32:00     0     0     0
10 2017-01-01 07:33:00     0     0     0
# ... with 44,812 more rows

答案 1 :(得分:0)

这有点棘手,但这种方法有效。注意我使用了几个tidyverse库。日期lubridate标准,以及dplyrtidyr,非常适合数据操作。如果您遇到包裹问题,请使用library(tidyverse)

library(dplyr)
library(tidyr)
library(lubridate)

Sys.setenv(TZ='GMT') # set your timezone

# put the data into date objects
events <- comms.break %>% 
  mutate_at(vars(eventstart, eventend), 
            ~ as.POSIXct(strptime(., format = "%m/%d/%Y %H:%M"))) %>%
  # now expand it
  rowwise %>% 
  mutate(Time = seq(eventstart, eventend, by = "min") %>% list) %>% 
  unnest(Time) %>% 
  select(line, Time)

# get the time range to make the minute table from
time_range <- events %>% 
  pull(Time) %>% 
  range

# make the minute vector and join in the events
time_series <- seq(time_range[1], time_range[2], by = "min") %>% 
data_frame(Time = .) %>% 
left_join(events, by = "Time") %>%
mutate(counter = 1) %>% 
spread(line, counter, fill = 0) %>% 
select(-`<NA>`)