我有以下数据:
DateTime | Var1 | Var2 | var3 | var4 | %Var1 | level
-------------------------------------------------------------------
11/15/2016 6:11 | 0 | 0.94 | 0.65 | 1.14 | 0 | (0,5]
11/15/2016 6:12 | 0.70 | 29.98 | 9.01 | 30.01 | 0.53 | (0,5]
11/15/2016 6:13 | 35.08 | 152.23| 141.71| 103.7 | 26.57 | (5,30]
11/15/2016 6:14 | 69.05 | 137.97| 130.81| 101.54| 52.31 | (30,60]
11/15/2016 6:15 | 69.38 | 138.7 | 131.3 | 101.67| 52.56 | (30,60]
11/15/2016 6:19 | 80.63 | 140 | 134 | 126.45| 61.09 | (60,100]
11/15/2016 6:20 | 82.86 | 141.33| 136.09| 129.7 | 62.77 | (60,100]
11/15/2016 6:44 | 132.33| 206.18| 205.61| 205.64| 100.25| (100,500]
11/15/2016 6:45 | 128.75| 202.51| 197.69| 198.92| 97.53 | (60,100]
Datetime和Var1 - Var4的列出现在起始数据中。
%Var1列是通过将Var1计算为预定义值的百分比来获得的。然后,%var1列中的数据被分解为不同的“级别”(由最后一列表示) 这些级别可能并不总是以有序的方式出现,即(100,500)可能后跟(5,30),依此类推。
我必须计算在每个不同级别花费的时间间隔。 因此,在该级别(60,100)中花费的总时间是从6:19到6:44以及从6:45开始到下一个数据点(表中未显示)。
如何计算?
我发现了这篇相关帖子R Calculate time difference between events;但是,行包含转换时间点的数据,而在我的情况下,我必须通过查看后续行数据来确定系统是继续处于同一级别还是正在进行转换。
编辑:
我计算了连续实例之间的时间差,并将其作为数据帧的列添加。
df <- data.frame(s$dateTime, s$Var1, s$Var2, s$Var3, s$Var4)
df$Var5 <- df$s.Var1 * 100/NumericConstant
fac <- cut(df$Var5, c(-10, 5, 30, 60, 100, 500))
df <- cbind(df,fac)
c_time <- as.POSIXlt(df$DateTime )
timedur <- as.numeric(difftime(c_time[2:length(c_time)] , c_time[1:(length(c_time)-1)], tz = 'UTC'))
timedur <- append(timedur,'NA') ## add 'NA' at end, since length(timedur) is 1 short of the DF
df <- cbind(df,timedur) ## add the time differences column to the dataframe
现在,我的数据如下所示:
DateTime | Var1 | Var2 | var3 | var4 | %Var1 | level | timedur
-------------------------------------------------------------------
11/15/2016 6:11 | 0 | 0.94 | 0.65 | 1.14 | 0 | (0,5] | 60
11/15/2016 6:12 | 0.70 | 29.98 | 9.01 | 30.01 | 0.53 | (0,5] | 60
......... and so on
我想检查系统在状态(0,5)变为(5,30)之前的状态,然后是(5,30),然后是(30,60)等等多长时间上。
答案 0 :(得分:0)
这是一个使用交叉连接并选择具有不同级别的第一行的解决方案。
生成数据:
str <- '
DateTime | Var1 | Var2 | var3 | var4 | %Var1 | level
11/15/2016 6:11 | 0 | 0.94 | 0.65 | 1.14 | 0 | (0,5]
11/15/2016 6:12 | 0.70 | 29.98 | 9.01 | 30.01 | 0.53 | (0,5]
11/15/2016 6:13 | 35.08 | 152.23| 141.71| 103.7 | 26.57 | (5,30]
11/15/2016 6:14 | 69.05 | 137.97| 130.81| 101.54| 52.31 | (30,60]
11/15/2016 6:15 | 69.38 | 138.7 | 131.3 | 101.67| 52.56 | (30,60]
11/15/2016 6:19 | 80.63 | 140 | 134 | 126.45| 61.09 | (60,100]
11/15/2016 6:20 | 82.86 | 141.33| 136.09| 129.7 | 62.77 | (60,100]
11/15/2016 6:44 | 132.33| 206.18| 205.61| 205.64| 100.25| (100,500]
11/15/2016 6:45 | 128.75| 202.51| 197.69| 198.92| 97.53 | (60,100]
'
file <- textConnection(str)
df <- read.table(file, sep = "|", header = T)
df$DateTime <- as.POSIXct(df$DateTime , format="%m/%d/%Y %H:%M")
将数据帧加入自身,以找到具有不同级别的下一个DateTime。
library(dplyr)
nxt <- df %>% mutate(dummy = 1) %>%
inner_join(df %>%
select(level, DateTime) %>%
rename(DateTimeNext = DateTime, levelNext = level) %>%
mutate(dummy=1), by='dummy') %>%
# remove previous rows and the same level
filter(DateTime < DateTimeNext, level != levelNext) %>%
# group data to use in row_number()
group_by(DateTime) %>%
# select first row with different level
filter(row_number(DateTimeNext) == 1) %>%
select(DateTime, DateTimeNext)
df %>% left_join(nxt)
# filter out overlapping rows
df %>% left_join(nxt) %>% group_by(DateTimeNext) %>% filter(row_number(DateTime) == 1) %>%
mutate(timedur = DateTimeNext - DateTime)
结果:
DateTime Var1 Var2 var3 var4 X.Var1 level DateTimeNext timedur
<dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <fctr> <dttm> <time>
1 2016-11-15 06:11:00 0.00 0.94 0.65 1.14 0.00 (0,5] 2016-11-15 06:13:00 2 mins
2 2016-11-15 06:13:00 35.08 152.23 141.71 103.70 26.57 (5,30] 2016-11-15 06:14:00 1 mins
3 2016-11-15 06:14:00 69.05 137.97 130.81 101.54 52.31 (30,60] 2016-11-15 06:19:00 5 mins
4 2016-11-15 06:19:00 80.63 140.00 134.00 126.45 61.09 (60,100] 2016-11-15 06:44:00 25 mins
5 2016-11-15 06:44:00 132.33 206.18 205.61 205.64 100.25 (100,500] 2016-11-15 06:45:00 1 mins
6 2016-11-15 06:45:00 128.75 202.51 197.69 198.92 97.53 (60,100] <NA> NA mins