Start Stop
00:03.1 00:40.9
00:55.0 01:38.2
01:40.0 02:01.1
Time.in.sec Yes.or.No
0.0 N
0.1 N
... ...
3.0 N
3.1 Y
3.2 Y
... ...
40.8 Y
40.9 N
... ...
选项2:使用某种逻辑规则拆分分钟标记的时间间隔和每分钟的总时间(从时间= 0:00.0开始)。
Start Stop
00:03.10 00:40.90
00:55.00 00:59.99
01:00.00 01:38.20
01:40.00 01:59.99
02:00.00 02:01.10
关于Stackoverflow的其他问题似乎是在解决从原始时间戳(例如What is an efficient method for partitioning and aggregating intervals from timestamped rows in a data frame?和Aggregate data by equally spaced time intervals in R)制作垃圾箱的问题,但我基本上想要做相反的事情。
Start, Stop
00:03.1, 00:40.9
00:55.0, 01:38.2
01:40.0, 02:01.1
03:03.1, 04:30.3
04:50.0, 05:01.5
05:08.7, 05:22.0
05:40.1, 05:47.9
Minute time.yes.sec perc.time.yes
1 42.8 71.33
2 58.2 96.98
3 1.1 1.83
# *NOTE: Here, Minute 1 = [0, 60), Minute 2 = [60, 120), etc.; I'm not opposed
# to the reverse definitions though (Minute 1 = (0, 60], etc.).
答案 0 :(得分:4)
df %>% mutate_all(funs(period_to_seconds(ms(.)))) %>% # convert each time to seconds
rowwise() %>% # evaluate the following row-by-row
# make a sequence from Start to Stop by 0.1, wrapped in a list
mutate(instant = list(seq(Start, Stop, by = 0.1))) %>%
unnest() %>% # expand list column
# make a factor, cutting instants into 60 second bins
mutate(minute = cut(instant, breaks = (0:6) * 60, labels = 1:6)) %>%
group_by(minute) %>% # evaluate the following grouped by new factor column
# for each group, count the rows, subtracting 1 for starting instants, and
# dividing by 10 to convert from tenths of seconds to secontds
summarise(elapsed = (n() - n_distinct(Start)) / 10,
pct_elapsed = elapsed / 60 * 100) # convert to percent
## # A tibble: 6 × 3
## minute elapsed pct_elapsed
## <fctr> <dbl> <dbl>
## 1 1 42.8 71.333333
## 2 2 58.1 96.833333
## 3 3 1.0 1.666667
## 4 4 56.9 94.833333
## 5 5 40.2 67.000000
## 6 6 22.5 37.500000
df %>% mutate_all(funs(period_to_seconds(ms(.)))) %>% # convert to seconds
gather(var, val) %>% # gather to long form
# construct and rbind data.frame of breaks at minute changes
bind_rows(expand.grid(var = c('Start', 'Stop'),
val = seq(60, by = 60, length.out = floor(max(.$val)/60)))) %>%
arrange(val, desc(var)) %>% # sort
mutate(index = rep(seq(n()/2), each = 2)) %>% # make indices for spreading
spread(var, val) %>% # spread back to wide form
mutate(elapsed = Stop - Start) %>% # calculate elapsed time for each row
# add and group by factor of which minute each falls in
group_by(minute = cut(Stop, seq(0, by = 60, length.out = ceiling(max(Stop) / 60 + 1)),
labels = 1:6)) %>%
summarise(elapsed = sum(elapsed), # calculate summaries
pct_elapsed = elapsed / 60 * 100)
## # A tibble: 6 × 3
## minute elapsed pct_elapsed
## <fctr> <dbl> <dbl>
## 1 1 42.8 71.333333
## 2 2 58.2 97.000000
## 3 3 1.1 1.833333
## 4 4 56.9 94.833333
## 5 5 40.3 67.166667
## 6 6 22.6 37.666667
答案 1 :(得分:3)
Start Stop
00:03.1 00:40.9
00:55.0 01:38.2
01:40.0 02:01.1
agg <- read.table(con<-file("clipboard"), header=T)
agg$Start <- lubridate::ms(agg$Start)
agg$Stop <- lubridate::ms(agg$Stop)
option1 <- data.frame(time = lubridate::seconds(seq(.1, 122, .1)),
flag = as.character("N"), stringsAsFactors = F)
for(i in 1:nrow(agg)){
option1$flag[option1$time > agg$Start[i] & option1$time < agg$Stop[i]] <- "Y"
N Y 201 1019
option1$minute <- ifelse(option1$time < lubridate::seconds(60), 0, 1)
option1$minute[option1$time > lubridate::seconds(120)] <- 2
table(option1$flag, option1$minute)
0 1 2 N 172 19 10 Y 427 582 10
prop.table(table(option1$flag, option1$minute),2)
0 1 2 N 0.28714524 0.03161398 0.50000000 Y 0.71285476 0.96838602 0.50000000