我有一个数据表dt_stadium_hours
>dt_stadium_hours
mon_from_time mon_to_time tue_from_time tue_to_time wed_from_time wed_to_time thu_from_time thu_to_time
1: 7.965174 21.39378 7.965174 21.39378 7.965174 21.39378 7.965174 21.39876
fri_from_time fri_to_time sat_from_time sat_to_time sun_from_time sun_to_time
1: 7.965174 21.39876 7.942786 21.35149 9.766915 16.91617
我有另一张桌子列出了体育场关闭的所有日子:dt_stadium_closed
> dt_stadium_closed
close_date
1: 2017-04-16
2: 2017-04-21
3: 2017-04-22
4: 2017-04-28
5: 2017-05-02
我有另一张桌子dt_player_start和dt_player_stop,它告诉玩家第一次开始比赛,以及他最后一次比赛的时间,看起来像,
> dt_player_start
played_date start_time day
1: 2017-04-14 1507 Friday
> dt_player_stop
played_date stop_time day
2: 2017-05-05 1842 Friday
我需要计算这个特定玩家的总小时数,
在表格中,他于1507时开始玩2017-04-14 " dt_player_start",因为它是星期五所以体育场在21.39876关闭,所以他必须离开,他演奏的最后一天是在" dt_player_stop" 。他于1842时在2017-05-05停止了比赛。
我需要计算玩家玩游戏的总小时数。体育场关闭的日子,见表" dt_stadium_closed"不应该被计算在内。
如何使用R
中的data.table执行此操作答案 0 :(得分:1)
可能的方法:
# create data.table with open and close times by day of the week
dt_open <- dcast(melt(dt_stadium_hours,
measure.vars = 1:14)[, c('day','from.to') := tstrsplit(sub('_','-',variable,fixed=TRUE), split = '-')
][, variable := NULL],
day ~ from.to)
# create a data.table with all the play dates
DT <- data.table(dates = seq.Date(dt_player_start$played_date,
dt_player_stop$played_date,
by = 'day'))[!dates %in% dt_stadium_closed$close_date]
# create a day-variable with day-abreviations similar to 'dt_open'
DT[, day := substr(tolower(weekdays(dates)),1,3)]
# join with 'dt_open' on 'day'
DT[dt_open, on = 'day', `:=` (from_time = from_time, to_time = to_time)]
# convert hour-values to data-time values
dcols <- c('from_time','to_time')
DT[, (dcols) := lapply(.SD, function(x) as.POSIXct(as.numeric(dates)*86400 + x*3600, origin = '1970-01-01', tz = 'GMT')), .SDcols = dcols]
# replace the first from-date
DT[dates == dt_player_start$played_date, from_time := as.POSIXct(paste(dt_player_start$played_date,dt_player_start$start_time), '%Y-%m-%d %H%M', tz = 'GMT')]
# replace the last to-date
DT[dates == dt_player_stop$played_date, to_time := as.POSIXct(paste(dt_player_stop$played_date,dt_player_stop$stop_time), '%Y-%m-%d %H%M', tz = 'GMT')]
# calculate hours played by day
DT[, played := to_time - from_time]
这给出了以下data.table:
> DT dates day from_time to_time played 1: 2017-04-14 fri 2017-04-14 15:07:00 2017-04-14 21:23:55 6.282093 hours 2: 2017-04-15 sat 2017-04-15 07:56:34 2017-04-15 21:21:05 13.408704 hours 3: 2017-04-17 mon 2017-04-17 07:57:54 2017-04-17 21:23:37 13.428606 hours 4: 2017-04-18 tue 2017-04-18 07:57:54 2017-04-18 21:23:37 13.428606 hours 5: 2017-04-19 wed 2017-04-19 07:57:54 2017-04-19 21:23:37 13.428606 hours 6: 2017-04-20 thu 2017-04-20 07:57:54 2017-04-20 21:23:55 13.433586 hours 7: 2017-04-23 sun 2017-04-23 09:46:00 2017-04-23 16:54:58 7.149255 hours 8: 2017-04-24 mon 2017-04-24 07:57:54 2017-04-24 21:23:37 13.428606 hours 9: 2017-04-25 tue 2017-04-25 07:57:54 2017-04-25 21:23:37 13.428606 hours 10: 2017-04-26 wed 2017-04-26 07:57:54 2017-04-26 21:23:37 13.428606 hours 11: 2017-04-27 thu 2017-04-27 07:57:54 2017-04-27 21:23:55 13.433586 hours 12: 2017-04-29 sat 2017-04-29 07:56:34 2017-04-29 21:21:05 13.408704 hours 13: 2017-04-30 sun 2017-04-30 09:46:00 2017-04-30 16:54:58 7.149255 hours 14: 2017-05-01 mon 2017-05-01 07:57:54 2017-05-01 21:23:37 13.428606 hours 15: 2017-05-03 wed 2017-05-03 07:57:54 2017-05-03 21:23:37 13.428606 hours 16: 2017-05-04 thu 2017-05-04 07:57:54 2017-05-04 21:23:55 13.433586 hours 17: 2017-05-05 fri 2017-05-05 07:57:54 2017-05-05 18:42:00 10.734826 hours
现在你可以得到播放时间的总和:
> DT[, sum(played)] Time difference of 205.8624 hours