如何计算R中具有相同值的连续变量的持续时间

时间:2014-09-24 18:32:52

标签: r


我想在交通灯的每个交通周期中相对计算绿色,琥珀色,红色的持续时间(我的示例数据中的列sg.0),例如计算第一个绿色的所有时间长度状态到每个周期的最后一个绿色状态,我该怎么办? Data.frame如下所示:

 time sg. 0
1   2014-09-01 00:00:12.0 green
2   2014-09-01 00:00:13.5 green
3   2014-09-01 00:00:30.0 amber
4   2014-09-01 00:00:30.0 amber
5   2014-09-01 00:00:31.5 amber
6   2014-09-01 00:00:32.0 amber
7   2014-09-01 00:00:32.2 amber
8   2014-09-01 00:00:33.5 amber
9   2014-09-01 00:00:33.0   red
10  2014-09-01 00:00:35.0   red
11  2014-09-01 00:00:35.2   red
12  2014-09-01 00:00:37.0   red
13  2014-09-01 00:00:41.0   red
14  2014-09-01 00:00:42.0   red
15  2014-09-01 00:00:42.2   red
16  2014-09-01 00:00:43.0   red
17  2014-09-01 00:00:44.7   red
18  2014-09-01 00:00:44.2   red
19  2014-09-01 00:00:45.5   red
20  2014-09-01 00:00:47.0   red
21  2014-09-01 00:00:48.7   red
22  2014-09-01 00:00:49.7   red
23  2014-09-01 00:00:49.7   red
24  2014-09-01 00:00:49.9   red
25  2014-09-01 00:00:50.9 green
26  2014-09-01 00:00:50.0 green
27  2014-09-01 00:00:52.0 green
28  2014-09-01 00:00:53.0 green
29  2014-09-01 00:00:54.0 green
30  2014-09-01 00:00:55.0 green
31  2014-09-01 00:00:55.0 green
32  2014-09-01 00:01:02.0 green
33  2014-09-01 00:01:03.7 green
34  2014-09-01 00:01:05.7 green
35  2014-09-01 00:01:07.0 green

原始数据:

structure(list(time = structure(c(1409518812, 1409518813.6, 1409518830, 
1409518830.1, 1409518831.6, 1409518832, 1409518832.2, 1409518833.6, 
1409518833, 1409518835, 1409518835.3, 1409518837, 1409518841, 
1409518842, 1409518842.3, 1409518843, 1409518844.8, 1409518844.2, 
1409518845.6, 1409518847, 1409518848.7, 1409518849.7, 1409518849.8, 
1409518849.9, 1409518850.9, 1409518850, 1409518852, 1409518853, 
1409518854, 1409518855, 1409518855.1, 1409518862, 1409518863.8, 
1409518865.8, 1409518867, 1409518868, 1409518870.7, 1409518870.3, 
1409518884, 1409518884.2, 1409518884.3, 1409518884.5, 1409518890, 
1409518942, 1409518942.1, 1409518943.7, 1409518943.3, 1409518944.9, 
1409518944, 1409518945, 1409518947, 1409518949.5, 1409518949.6, 
1409518953, 1409518954, 1409518957.8, 1409518957.2, 1409518961, 
1409518961.1, 1409518961.2, 1409518962.2, 1409518962.3, 1409518964, 
1409518965, 1409518966, 1409518967, 1409518967.1, 1409518974, 
1409518975.8, 1409518977.8, 1409518979, 1409518980, 1409519068, 
1409519068.1, 1409519068.7, 1409519070, 1409519071, 1409519073, 
1409519073.8, 1409519081, 1409519082, 1409519083.3, 1409519083.8, 
1409519084.7, 1409519086, 1409519087.6, 1409519089.2, 1409519089.3, 
1409519091, 1409519091.1, 1409519091.6, 1409519092, 1409519092.1, 
1409519093, 1409519094, 1409519094.5, 1409519095, 1409519095.1, 
1409519103, 1409519104), class = c("POSIXct", "POSIXt")), `sg. 0` = structure(c(2L, 
2L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 
2L, 2L, 2L), .Label = c("amber", "green", "red"), class = "factor")), .Names = c("time", 
"sg. 0"), row.names = c(NA, 100L), class = "data.frame")

2 个答案:

答案 0 :(得分:2)

您可能希望首先唯一地标识每个颜色循环,然后您可以收集每个颜色循环的统计数据。您可以使用

找到循环
cycle<-cumsum(c(FALSE, dd[-1,2] != dd[-nrow(dd),2]))

(假设您的data.frame名为dd)。然后你可以找到从开始到结束的持续时间

tapply(dd[,1], interaction(dd[,2], cycle, drop=T), function(x) diff(range(x)))

给出了

green.0 amber.1   red.2 green.3 amber.4   red.5 green.6 amber.7   red.8 green.9 
    1.6     3.6    16.9    40.0     2.9    16.2    17.8     2.0    23.5     9.0 

或者如果你的意思是格力/琥珀色/红色循环中的循环,你可以做

cycle<-cumsum(c(dd[1,2]!="green", dd[-1,2] == "green" & dd[-nrow(dd),2] !="green"))
tapply(dd[,1], cycle, function(x) as.double(diff(range(x)), units="mins"))

给出了

        0         1         2         3 
0.6316667 1.8533333 2.2050000 0.1500000

答案 1 :(得分:1)

与MrFlick的方法类似,您可以使用rle首先为每个颜色周期生成一个指标,然后使用它来计算持续时间。

# If you want to calculate the time within each colour
r <- rle(as.numeric(dat$sg.0))
r$values <- seq_along(r$values)
dat$id <- inverse.rle(r)

(a <- aggregate(time ~ sg.0 + id, dat, function(i) diff(as.numeric(range(i)))))
#    sg.0 id time
#1  green  1  1.6
#2  amber  2  3.6
#3    red  3 16.9
# ...

# Use a similar approach, if the cycle is for each green/amber/red
r <- rle(as.numeric(dat$sg.0))
r$values <- rep(seq_along(r$values), each=3, length=length(r$values))
dat$cycle <- inverse.rle(r)

 (b <- aggregate(time ~ cycle, dat, function(i) diff(as.numeric(range(i)))))
#  cycle  time
#1     1  37.9
#2     2 111.2
#3     3 132.3
#4     4   9.0

编辑添加as.numeric以汇总函数调用,以便在几秒钟内始终如一地报告