使用data.table

时间:2017-05-30 19:03:34

标签: r datetime data.table

我有一个数据表dt_stadium_hours

>dt_stadium_hours
   mon_from_time mon_to_time tue_from_time tue_to_time wed_from_time wed_to_time thu_from_time thu_to_time
1:      7.965174    21.39378      7.965174    21.39378      7.965174    21.39378      7.965174    21.39876
   fri_from_time fri_to_time sat_from_time sat_to_time sun_from_time sun_to_time
1:      7.965174    21.39876      7.942786    21.35149      9.766915    16.91617

我有另一张桌子列出了体育场关闭的所有日子:dt_stadium_closed

> dt_stadium_closed
    close_date
1:    2017-04-16
2:    2017-04-21
3:    2017-04-22
4:    2017-04-28
5:    2017-05-02 

我有另一张桌子dt_player_start和dt_player_stop,它告诉玩家第一次开始比赛,以及他最后一次比赛的时间,看起来像,

> dt_player_start 
   played_date  start_time     day
1:   2017-04-14     1507       Friday

> dt_player_stop
   played_date  stop_time      day
2:   2017-05-05     1842       Friday

我需要计算这个特定玩家的总小时数,

在表格中,他于1507时开始玩2017-04-14  " dt_player_start",因为它是星期五所以体育场在21.39876关闭,所以他必须离开,他演奏的最后一天是在" dt_player_stop" 。他于1842时在2017-05-05停止了比赛。

我需要计算玩家玩游戏的总小时数。体育场关闭的日子,见表" dt_stadium_closed"不应该被计算在内。

如何使用R

中的data.table执行此操作

1 个答案:

答案 0 :(得分:1)

可能的方法:

# create data.table with open and close times by day of the week
dt_open <- dcast(melt(dt_stadium_hours,
                      measure.vars = 1:14)[, c('day','from.to') := tstrsplit(sub('_','-',variable,fixed=TRUE), split = '-')
                                           ][, variable := NULL],
                 day ~ from.to)

# create a data.table with all the play dates
DT <- data.table(dates = seq.Date(dt_player_start$played_date, 
                                  dt_player_stop$played_date,
                                  by = 'day'))[!dates %in% dt_stadium_closed$close_date]


# create a day-variable with day-abreviations similar to 'dt_open'
DT[, day := substr(tolower(weekdays(dates)),1,3)]

# join with 'dt_open' on 'day'
DT[dt_open, on = 'day', `:=` (from_time = from_time, to_time = to_time)]

# convert hour-values to data-time values
dcols <- c('from_time','to_time')
DT[, (dcols) := lapply(.SD, function(x) as.POSIXct(as.numeric(dates)*86400 + x*3600, origin = '1970-01-01', tz = 'GMT')), .SDcols = dcols]

# replace the first from-date
DT[dates == dt_player_start$played_date, from_time := as.POSIXct(paste(dt_player_start$played_date,dt_player_start$start_time), '%Y-%m-%d %H%M', tz = 'GMT')]

# replace the last to-date
DT[dates == dt_player_stop$played_date, to_time := as.POSIXct(paste(dt_player_stop$played_date,dt_player_stop$stop_time), '%Y-%m-%d %H%M', tz = 'GMT')]

# calculate hours played by day
DT[, played := to_time - from_time]

这给出了以下data.table:

> DT
         dates day           from_time             to_time          played
 1: 2017-04-14 fri 2017-04-14 15:07:00 2017-04-14 21:23:55  6.282093 hours
 2: 2017-04-15 sat 2017-04-15 07:56:34 2017-04-15 21:21:05 13.408704 hours
 3: 2017-04-17 mon 2017-04-17 07:57:54 2017-04-17 21:23:37 13.428606 hours
 4: 2017-04-18 tue 2017-04-18 07:57:54 2017-04-18 21:23:37 13.428606 hours
 5: 2017-04-19 wed 2017-04-19 07:57:54 2017-04-19 21:23:37 13.428606 hours
 6: 2017-04-20 thu 2017-04-20 07:57:54 2017-04-20 21:23:55 13.433586 hours
 7: 2017-04-23 sun 2017-04-23 09:46:00 2017-04-23 16:54:58  7.149255 hours
 8: 2017-04-24 mon 2017-04-24 07:57:54 2017-04-24 21:23:37 13.428606 hours
 9: 2017-04-25 tue 2017-04-25 07:57:54 2017-04-25 21:23:37 13.428606 hours
10: 2017-04-26 wed 2017-04-26 07:57:54 2017-04-26 21:23:37 13.428606 hours
11: 2017-04-27 thu 2017-04-27 07:57:54 2017-04-27 21:23:55 13.433586 hours
12: 2017-04-29 sat 2017-04-29 07:56:34 2017-04-29 21:21:05 13.408704 hours
13: 2017-04-30 sun 2017-04-30 09:46:00 2017-04-30 16:54:58  7.149255 hours
14: 2017-05-01 mon 2017-05-01 07:57:54 2017-05-01 21:23:37 13.428606 hours
15: 2017-05-03 wed 2017-05-03 07:57:54 2017-05-03 21:23:37 13.428606 hours
16: 2017-05-04 thu 2017-05-04 07:57:54 2017-05-04 21:23:55 13.433586 hours
17: 2017-05-05 fri 2017-05-05 07:57:54 2017-05-05 18:42:00 10.734826 hours

现在你可以得到播放时间的总和:

> DT[, sum(played)]
Time difference of 205.8624 hours