这是我第一次在此处发布问题,因此如果我做错了,我提前致歉。现在,我将尝试解释我的问题,并提供一个可复制的示例。 TIA
我有一个数据框,显示何时在不同位置检测到动物。如果在某个时间段(5分钟)内未在站点B上检测到单个动物,我想从检测文件(df)中删除仅用于站点的行。 我需要遍历每只动物和多个站点。我的真实数据有许多动物和超过一百万个检测观测值。我猜想这至少需要两个for循环。
我已经能够找到确切的时间是否在第二个数据帧中,但是我不知道如何添加“阈值”,这样可以说出时间(例如5分钟)
示例:
obs.num<-1:20 # a simple observation number
animal<-c(rep("RBT 1",10),rep("RBT 2",7) ,rep("RBT 3",2),"RBT 2") # a fake list of animal id's (my data has many)
now <- Sys.time()
ts <- seq(from = now, length.out = 16, by = "mins")
ts <- c(ts,seq(from=tail(ts,1), length.out = 4, by = "hour")) # create a fake series of time stamps
df<-cbind.data.frame(obs.num,animal,ts) # make data frame
df$site<-c("A","B","A","B","A","B","A","B","A","B","A","B","A","B","A","B","A","B","A","B")# make a fake series of sites detection occured at
str(df)
df # my example data frame
> df
obs.num animal ts site
1 1 RBT 1 2018-11-30 15:11:38 A
2 2 RBT 1 2018-11-30 15:12:38 B
3 3 RBT 1 2018-11-30 15:13:38 A
4 4 RBT 1 2018-11-30 15:14:38 B
5 5 RBT 1 2018-11-30 15:15:38 A
6 6 RBT 1 2018-11-30 15:16:38 B
7 7 RBT 1 2018-11-30 15:17:38 A
8 8 RBT 1 2018-11-30 15:18:38 B
9 9 RBT 1 2018-11-30 15:19:38 A
10 10 RBT 1 2018-11-30 15:20:38 B
11 11 RBT 2 2018-11-30 15:21:38 A
12 12 RBT 2 2018-11-30 15:22:38 B
13 13 RBT 2 2018-11-30 15:23:38 A
14 14 RBT 2 2018-11-30 15:24:38 B
15 15 RBT 2 2018-11-30 15:25:38 A
16 16 RBT 2 2018-11-30 15:26:38 B
17 17 RBT 2 2018-11-30 15:26:38 A
18 18 RBT 3 2018-11-30 16:26:38 B
19 19 RBT 3 2018-11-30 17:26:38 A
20 20 RBT 2 2018-11-30 18:26:38 B
在此示例中,我想删除整个行以进行观察19。
在更大的真实数据集中,我能够确定在站点A和另一个站点恰好同时发生检测的行/时间,但是我真的在如何在大数据帧以及如何用某种语法来替换%in%以解决时间不精确但非常接近的时间(即5分钟内)
animals<-unique(animal)
for (i in 1:length(animals)) {
which(df[df$animals==animals[i] & df$site=="A",]$ts %in%
df[df$animals==animals[i] & df$site=="B",]$ts)
}
感谢您的帮助,请询问我是否可以提供更多详细信息/说明。
更新示例(我希望能够根据检测到的每只动物
来执行此操作在此示例中,我仍然希望删除观察值19,但是基于@G的答案,答案将不会导致该结果。格洛腾迪克
df[21,]<-df[19,]
df$animal<-as.character(df$animal)
df[21,"animal"]<-"RBT 4"
df[21,"site"]<-"B"
df[21,"obs.num"]<-21
df$animal<-as.factor(df$animal)
df<-df[order(df$ts),]
df
答案 0 :(得分:1)
将表B
定义为站点B的那些行,然后将df
联接到B
中满足条件的那些行。请注意,现在已删除观察值19。
library(sqldf)
sqldf("with B as (select * from df where site == 'B')
select distinct df.* from df
join B on df.animal = B.animal and
B.ts - df.ts between -5 * 60 and 5 * 60
order by 1")
给予:
obs.num animal ts site
1 1 RBT 1 2018-12-03 16:43:00 A
2 2 RBT 1 2018-12-03 16:44:00 B
3 3 RBT 1 2018-12-03 16:45:00 A
4 4 RBT 1 2018-12-03 16:46:00 B
5 5 RBT 1 2018-12-03 16:47:00 A
6 6 RBT 1 2018-12-03 16:48:00 B
7 7 RBT 1 2018-12-03 16:49:00 A
8 8 RBT 1 2018-12-03 16:50:00 B
9 9 RBT 1 2018-12-03 16:51:00 A
10 10 RBT 1 2018-12-03 16:52:00 B
11 11 RBT 2 2018-12-03 16:53:00 A
12 12 RBT 2 2018-12-03 16:54:00 B
13 13 RBT 2 2018-12-03 16:55:00 A
14 14 RBT 2 2018-12-03 16:56:00 B
15 15 RBT 2 2018-12-03 16:57:00 A
16 16 RBT 2 2018-12-03 16:58:00 B
17 17 RBT 2 2018-12-03 16:58:00 A
18 18 RBT 3 2018-12-03 17:58:00 B
19 20 RBT 2 2018-12-03 19:58:00 B
20 21 RBT 4 2018-12-03 18:58:00 B
鉴于问题中的示例有所变化,我们在上面的示例中使用了这一点:
obs.num<-1:20 # a simple observation number
animal<-c(rep("RBT 1",10),rep("RBT 2",7) ,rep("RBT 3",2),"RBT 2") # a fake list of animal id's (my data has many)
now <- Sys.time()
ts <- seq(from = now, length.out = 16, by = "mins")
ts <- c(ts,seq(from=tail(ts,1), length.out = 4, by = "hour")) # create a fake series of time stamps
df<-cbind.data.frame(obs.num,animal,ts) # make data frame
df$site<-c("A","B","A","B","A","B","A","B","A","B","A","B","A","B","A","B","A","B","A","B")# make a fake series of sites detection occured at
df[21,]<-df[19,]
df$animal<-as.character(df$animal)
df[21,"animal"]<-"RBT 4"
df[21,"site"]<-"B"
df[21,"obs.num"]<-21
df$animal<-as.factor(df$animal)
df<-df[order(df$ts),]