按组,子组和时间戳对组中的小偏移量进行排序

时间:2017-10-25 15:37:41

标签: r

我有一个包含3列的数据框1)时间戳,2)组3)组内的索引(子组)。一个组有6行/索引,并且应始终具有相同的时间戳,最大允许偏差为2秒。 有时,来自两个不同组的某些元素具有相同的时间戳,但其他元素则没有。我需要能够根据时间戳排列数据以对组进行聚类,但首先要考虑组中的元素可能具有最多2秒的偏移量。

df1 <- data.frame(
   timestamp1 = as.POSIXct(c(
      '2017-09-07 15:16:27',  '2017-09-07 15:16:27',  '2017-09-07 15:16:27',  '2017-09-07 15:16:27',  '2017-09-07 15:16:27',  '2017-09-07 15:16:27',
      '2017-09-07 15:17:19', '2017-09-07 15:17:19', '2017-09-07 15:17:19', '2017-09-07 15:17:19', 
      '2017-09-07 15:17:19', '2017-09-07 15:17:19', '2017-09-07 15:17:19', '2017-09-07 15:17:19', '2017-09-07 15:17:19', 
      '2017-09-07 15:17:20', '2017-09-07 15:17:20',
      '2017-09-07 15:17:20'
      )), 
   group = c(
      'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 
      'a', 'a', 'a', 'a',
      'aaa', 'aaa', 'aaa', 'aaa', 'aaa',
      'a', 'a',
      'aaa'
      ),
   index_inside_group = c(
      1, 2, 3, 4, 5, 6,
      1, 3, 4, 6,
      1, 2, 4, 5, 6,
      2, 5,
      3
      )
   )
> df1
            timestamp1 group index_inside_group
1  2017-09-07 15:16:27   aaa                  1
2  2017-09-07 15:16:27   aaa                  2
3  2017-09-07 15:16:27   aaa                  3
4  2017-09-07 15:16:27   aaa                  4
5  2017-09-07 15:16:27   aaa                  5
6  2017-09-07 15:16:27   aaa                  6
7  2017-09-07 15:17:19     a                  1
8  2017-09-07 15:17:19     a                  3
9  2017-09-07 15:17:19     a                  4
10 2017-09-07 15:17:19     a                  6
11 2017-09-07 15:17:19   aaa                  1
12 2017-09-07 15:17:19   aaa                  2
13 2017-09-07 15:17:19   aaa                  4
14 2017-09-07 15:17:19   aaa                  5
15 2017-09-07 15:17:19   aaa                  6
16 2017-09-07 15:17:20     a                  2
17 2017-09-07 15:17:20     a                  5
18 2017-09-07 15:17:20   aaa                  3

简而言之,我需要的是从数据df1df2

> df2
            timestamp1 group index_inside_group
1  2017-09-07 15:16:27   aaa                  1
2  2017-09-07 15:16:27   aaa                  2
3  2017-09-07 15:16:27   aaa                  3
4  2017-09-07 15:16:27   aaa                  4
5  2017-09-07 15:16:27   aaa                  5
6  2017-09-07 15:16:27   aaa                  6
7  2017-09-07 15:17:19     a                  1
8  2017-09-07 15:17:20     a                  2
9  2017-09-07 15:17:19     a                  3
10 2017-09-07 15:17:19     a                  4
11 2017-09-07 15:17:20     a                  5
12 2017-09-07 15:17:19     a                  6
13 2017-09-07 15:17:19   aaa                  1
14 2017-09-07 15:17:19   aaa                  2
15 2017-09-07 15:17:20   aaa                  3
16 2017-09-07 15:17:19   aaa                  4
17 2017-09-07 15:17:19   aaa                  5
18 2017-09-07 15:17:19   aaa                  6

df2中,数据的排列优先级为group,然后是index_inside_group,且仅限于timestamp1

1 个答案:

答案 0 :(得分:0)

我认为我通过遍历每个唯一的时间戳找到了一个很长的解决方案,创建了一个+/- 2秒的新时间戳范围,然后抓住每个组落入该时间范围的所有值。 在每次迭代中,数据块按let array = _.map(_.filter(array1, function(o){ return _.includes(array2, o.date); }), 'count'); 排序,然后仅附加以正确的顺序重新创建index_inside_group

new_df