我正在使用GPS跟踪数据集,我一直在玩基于速度和时间过滤数据集。我正在工作的物种在黄昏时变得不活跃,在此期间它停留在海洋的表面,但是一旦夜晚下降就恢复活动。对于数据集中的每只动物,我想在黄昏(21:30)最初变为非活动状态后删除所有数据点。但是因为每只动物在不同的时间变得不活跃,所以我不能简单地过滤出21:30之后发生的所有数据点。
我的数据看起来像这样......
AnimalID Latitude Longitude Speed Date
99B 50.86190 -129.0875 5.6 2015-05-14 21:26:00
99B 50.86170 -129.0875 0.6 2015-05-14 21:32:00
99B 50.86150 -129.0810 0.5 2015-05-14 21:33:00
99B 50.86140 -129.0800 0.3 2015-05-14 21:40:00
99C.......
基本上,我希望在21:30:00之后找到一组GPS位置(例如,最少5个),它们都具有<0.8的速度。然后,我希望在此点之后删除所有点(包括已识别的群集)。
有谁知道识别R中的点簇的方法?或者这种类型的过滤是否复杂?
答案 0 :(得分:0)
使用data.table
,您可以使用向前/向后滚动来查找动物ID下五个或前五个条目的最大值。然后,过滤掉任何不符合标准的内容。例如:
library(data.table)
set.seed(40)
DT <- data.table(Speed = runif(1:1000), AnimalID = rep(c("A","B"), each = 500))
DT[ , FSpeed := Reduce(pmax,shift(Speed,0:4, type = "lead", fill = 1)), by = .(AnimalID)] #0 + 4 forward
DT[ , BSpeed := Reduce(pmax,shift(Speed,0:4, type = "lag", fill = 1)), by = .(AnimalID)] #0 + 4 backwards
DT[FSpeed < 0.5 | BSpeed < 0.5] #min speed
Speed AnimalID FSpeed BSpeed
1: 0.220509197 A 0.4926640 0.8897597
2: 0.225883211 A 0.4926640 0.8897597
3: 0.264809801 A 0.4926640 0.6648507
4: 0.184270587 A 0.4926640 0.6589303
5: 0.492664002 A 0.4926640 0.4926640
6: 0.472144689 A 0.4721447 0.4926640
7: 0.254635219 A 0.7409803 0.4926640
8: 0.281538568 A 0.7409803 0.4926640
9: 0.304875597 A 0.7409803 0.4926640
10: 0.059605991 A 0.7409803 0.4721447
11: 0.132069268 A 0.2569604 0.9224052
12: 0.256960449 A 0.2569604 0.9224052
13: 0.005059727 A 0.8543111 0.2569604
14: 0.191478376 A 0.8543111 0.2569604
15: 0.170969244 A 0.4398143 0.7927442
16: 0.059577719 A 0.4398143 0.7927442
17: 0.439814267 A 0.4398143 0.7927442
18: 0.307714603 A 0.9912536 0.4398143
19: 0.075750773 A 0.9912536 0.4398143
20: 0.100589403 A 0.9912536 0.4398143
21: 0.032957748 A 0.4068012 0.7019594
22: 0.080091554 A 0.4068012 0.7019594
23: 0.406801193 A 0.9761119 0.4068012
24: 0.057445020 A 0.9761119 0.4068012
25: 0.308382143 A 0.4516870 0.9435490
26: 0.451686996 A 0.4516870 0.9248595
27: 0.221964923 A 0.4356419 0.9248595
28: 0.435641917 A 0.5363373 0.4516870
29: 0.237658906 A 0.5363373 0.4516870
30: 0.324597512 A 0.9710011 0.4356419
31: 0.357198893 B 0.4869905 0.9226573
32: 0.486990475 B 0.4869905 0.9226573
33: 0.115922994 B 0.4051843 0.9226573
34: 0.010581766 B 0.9338841 0.4869905
35: 0.003976893 B 0.9338841 0.4869905
36: 0.405184342 B 0.9338841 0.4051843
37: 0.412468699 B 0.4942280 0.9113595
38: 0.402063509 B 0.4942280 0.9113595
39: 0.494228013 B 0.8254665 0.4942280
40: 0.123264949 B 0.8254665 0.4942280
41: 0.251132449 B 0.4960371 0.9475821
42: 0.496037128 B 0.8845043 0.4960371
43: 0.250853014 B 0.3561290 0.9858652
44: 0.356129033 B 0.3603769 0.8429552
45: 0.225943145 B 0.7028077 0.3561290
46: 0.360376907 B 0.7159759 0.3603769
47: 0.169606203 B 0.3438164 0.9745535
48: 0.343816363 B 0.4396962 0.9745535
49: 0.067265545 B 0.9641856 0.3438164
50: 0.439696243 B 0.9641856 0.4396962
51: 0.024403516 B 0.3730828 0.9902976
52: 0.373082846 B 0.4713596 0.9902976
53: 0.290466668 B 0.9689225 0.3730828
54: 0.471359568 B 0.9689225 0.4713596
55: 0.402111615 B 0.4902595 0.8045104
56: 0.490259530 B 0.8801029 0.4902595
57: 0.477884140 B 0.4904800 0.6696598
58: 0.490480001 B 0.8396014 0.4904800
Speed AnimalID FSpeed BSpeed
这显示了以下或前四个(+锚定单元格)都具有低于我们最低速度的最大速度的所有集群(在这种情况下为0.5)
在您的代码中,只需运行DT <- as.data.table(myDF)
,其中myDF是您正在使用的data.frame的名称。
对于此分析,我们假设以恒定间隔测量GPS测量值。我也通过设置fill=1
来抛出前4个和后4个观察结果。您应该将fill=
设置为最高速度。