按R中的指定时间段对数据行进行分类

时间:2018-06-26 04:04:03

标签: r

按时间对数据进行分类时遇到了问题。 外观如下:

我有一个数据列表,用于指定每个时间段的开始时间和结束时间:

   Period    Start      END
1      A1 11:08:58 11:11:58
2      A2 12:08:58 12:11:58
3      A3 13:08:58 13:11:58
4      A4 14:08:58 14:11:58
5      A5 15:08:58 15:11:58
6      A6 16:08:58 16:11:58
7      A7 17:08:58 17:11:58
8      A8 18:08:58 18:11:58
9      A9 19:08:58 19:11:58
10    A10 20:08:58 20:11:58
11    A11 21:08:58 21:11:58
12    A12 22:08:58 22:11:58
13    A13 23:08:58 23:11:58
14    A14 00:08:58 00:11:58
15    A15 01:08:58 01:11:58
16    A16 02:08:58 02:11:58
17    A17 03:08:58 03:11:58
18    A18 04:08:58 04:11:58
19    A19 05:08:58 05:11:58
20    A20 06:08:58 06:11:58

我还有另一个列表,用于指定每笔交易发生的时间:

Transaction Transaction.Time
1        TR015         12:10:58
2        TR008         18:10:58
3        TR009         13:10:58
4        TR019         14:10:58
5        TR001         15:10:58
6        TR011         16:10:58
7        TR018         17:10:58
8        TR005         11:10:58
9        TR013         19:10:58
10       TR012         20:10:58
11       TR014         21:10:58
12       TR004         22:10:58
13       TR020         23:10:58
14       TR010         00:10:58
15       TR016         01:10:58
16       TR007         02:10:58
17       TR017         03:10:58
18       TR006         04:10:58
19       TR003         05:10:58
20       TR002         06:10:58

我试图做的是合并这两个列表,以了解每笔交易在哪个时期发生,例如:

   Transaction Transaction.Time Period    Start      END
1        TR015         12:10:58 A2        12:08:58 12:11:58
2        TR008         18:10:58 A8        18:08:58 18:11:58
3        TR009         13:10:58 A3        13:08:58 13:11:58

2 个答案:

答案 0 :(得分:0)

我为您的数据框假设以下名称: df_period:用于时间段的数据帧 transaction_occurs:用于事务的数据帧(必须到列)

UIScrollView

答案 1 :(得分:0)

我将创建一个可复制的示例,如下所示:

period = read_delim('No Period Start END
1 A1 11:08:58 11:11:58
2 A2 12:08:58 12:11:58
3 A3 13:08:58 13:11:58
4 A4 14:08:58 14:11:58
5 A5 15:08:58 15:11:58
6 A6 16:08:58 16:11:58
7 A7 17:08:58 17:11:58
8 A8 18:08:58 18:11:58
9 A9 19:08:58 19:11:58
10 A10 20:08:58 20:11:58
11 A11 21:08:58 21:11:58
12 A12 22:08:58 22:11:58
13 A13 23:08:58 23:11:58
14 A14 00:08:58 00:11:58
15 A15 01:08:58 01:11:58
16 A16 02:08:58 02:11:58
17 A17 03:08:58 03:11:58
18 A18 04:08:58 04:11:58
19 A19 05:08:58 05:11:58
20 A20 06:08:58 06:11:58', delim = ' ')

tnx = read_delim('No Transaction Time
1 TR015 12:10:58
2 TR008 18:10:58
3 TR009 13:10:58
4 TR019 14:10:58
5 TR001 15:10:58
6 TR011 16:10:58
7 TR018 17:10:58
8 TR005 11:10:58
9 TR013 19:10:58
10 TR012 20:10:58
11 TR014 21:10:58
12 TR004 22:10:58 
13 TR020 23:10:58
14 TR010 00:10:58
15 TR016 01:10:58
16 TR007 02:10:58
17 TR017 03:10:58
18 TR006 04:10:58
19 TR003 05:10:58
20 TR002 06:10:58', delim = ' ')

要找到一个时间段,您必须能够将其转换为某种日期时间格式,这里的日期是无关紧要的,因此您可以将函数解析为任何日期,仍然可以使用。另外,必须确保周期数据完整,这意味着没有时间戳超出范围。

require(lubridate)
require(tidyverse)
period = period %>% mutate(Start = as_datetime(Start),
                           END = as_datetime(END))

tnx = tnx %>% mutate(Time = as_datetime(Time))

locate_period = function(time_stamp, period_data) {
  period_data = period_data %>% filter(Start <= time_stamp) %>% filter(END >= time_stamp)
  period_data$Period[[1]]
}

tnx$Period = ''

for (i in 1:nrow(tnx)) {
  tnx$Period[[i]] = locate_period(tnx$Time[[i]], period)
}

tnx = left_join(tnx, period, by = 'Period')