我的数据如下:
data female;
input cost_female :comma. @@;
datalines;
871 684 795 838 1,033 917 1,047 723 1,179 707 817 846 975 868 1,323 791 1,157 932 1,089 770
;
data male;
input cost_male :comma. @@;
datalines;
792 765 511 520 618 447 548 720 899 788 927 657 851 702 918 528 884 702 839 878
;
data repair_costs;
merge female male;
run;
现在,我需要获得每位员工的总时间和总休息时间 (即)工作人员编号:124
总及时:(最后存在时间 - 首次进入时间) - 总休息时间 (14:00 - 07:00) - (09:30 - 09:00)
总休息时间:每个中间退出和进入之间的时间 (09:30 - 09:00)
我正在努力解决这个问题。有人可以帮忙吗?
答案 0 :(得分:1)
如果您将时间格式设置为POSIXct,则可以减去它们(或直接使用difftime
来控制单位)。减去它们会返回可以求和的difftime
个对象:
library(tidyverse)
df <- structure(list(Staff = c(123L, 123L, 123L, 123L, 123L, 123L, 124L, 124L, 124L, 124L),
Event = c("Entry", "Exit", "Entry", "Exit", "Entry", "Exit", "Entry", "Exit", "Entry", "Exit"),
Time = c("07:00 Hrs", "08:15 Hrs", "08:30 Hrs", "11:15 Hrs", "11:30 Hrs", "15:00 Hrs", "07:00 Hrs", "09:00 Hrs", "09:30 Hrs", "14:00 Hrs")),
.Names = c("Staff", "Event", "Time"), class = "data.frame", row.names = c(NA, 10L
))
df2 <- df %>%
group_by(Staff) %>%
mutate(i = cumsum(Event == 'Entry'), # add index to allow reshaping
Time = as.POSIXct(Time, format = '%H:%M')) %>% # parse to datetime
spread(Event, Time) %>% # reshape to wide form
mutate(work_time = Exit - Entry,
break_time = lead(Entry) - Exit)
df2
#> # A tibble: 5 x 6
#> # Groups: Staff [2]
#> Staff i Entry Exit work_time break_time
#> <int> <int> <dttm> <dttm> <time> <time>
#> 1 123 1 2017-11-06 07:00:00 2017-11-06 08:15:00 1.25 hours 15 mins
#> 2 123 2 2017-11-06 08:30:00 2017-11-06 11:15:00 2.75 hours 15 mins
#> 3 123 3 2017-11-06 11:30:00 2017-11-06 15:00:00 3.50 hours NA mins
#> 4 124 1 2017-11-06 07:00:00 2017-11-06 09:00:00 2.00 hours 30 mins
#> 5 124 2 2017-11-06 09:30:00 2017-11-06 14:00:00 4.50 hours NA mins
# now just aggregate
df2 %>% summarise_at(vars(work_time, break_time), sum, na.rm = TRUE)
#> # A tibble: 2 x 3
#> Staff work_time break_time
#> <int> <time> <time>
#> 1 123 7.5 hours 30 mins
#> 2 124 6.5 hours 30 mins