Question

这个问题与Consecutive group number in R非常相似，但我认为这个问题不是同一个问题，而是一个更难的问题。

我目前正在处理汽车数据。我们每5分钟记录一次车的速度，它包含很多零值。我想添加一个新列，其中连续k个或多个k零速度编号为0，而其他部分编号（从1开始）。我们以示例数据为例：

sample <- data.frame(
  id = 1:15, 
  speed = c(50, 0, 0, 0, 50, 40, 0, 0, 25, 30, 50, 0, 30, 50, 40))

特别是对于这个样本数据，假设k等于2，那么我想要的结果应该是这样的：

    id speed number
1   1    50      1
2   2     0      0
3   3     0      0
4   4     0      0
5   5    50      2
6   6    40      2
7   7     0      0
8   8     0      0
9   9    25      3
10 10    30      3
11 11    50      3
12 12     0      3** <- here is the difference
13 13    30      3
14 14    50      3
15 15    40      3

我的数据中有超过100万行，所以我希望解决方案的速度可以接受。

设置阈值“k”的原因是，即使他们锁定汽车并进入睡眠状态，一些驾驶员也会将GPS打开。但是在间隔小于k的其他场合，由于十字路口的光线，它们才刚刚停止。我想专注于长时间的停留，只是忽略短暂停留。

希望我的问题对你有意义。谢谢。

Answer 1

你可以这样做，受到用户20650对此question的评论的启发：

numbering = function(v,k) {
  ## First, replacing stretches of less than k consecutive 0s by 1s
  r = rle(v);
  r$values[r$values==0 & r$lengths<k] = 1; 
  v2 = inverse.rle(r); 

  ## Then numbering consecutive stretches of non-zero values
  r2 = rle(v2!=0);  
  r2$values[r2$values] = cumsum(r2$values[r2$values]);
  return(inverse.rle(r2))
}

numbering(sample$speed,2)
[1] 1 0 0 0 2 2 0 0 3 3 3 3 3 3 3

numbering(sample$speed,3)
[1] 1 0 0 0 2 2 2 2 2 2 2 2 2 2 2

连续组号，阈值为R

1 个答案: