我希望有人能为我提出这个“问题”的建议,因为我真的不知道如何继续...... 好吧,我的数据是这样的
data<-data.frame(site=c(rep("A",3),rep("B",3),rep("C",3)),time=c(100,180,245,5,55,130,70,120,160))
时间以分钟为单位。 我想只选择每个站点的差异大于60的记录,所以输出应该是这样的:
out<-data[c(1:4,6,7,9),]
到目前为止我尝试了什么。好吧,为了得到差异我用这个:
difference<-stack(tapply(data$time,data$site,diff))
然而,不知道如何拿起那些符合我条件的记录...... 如果已经有类似的问题,虽然我已经搜索了一段时间,但我为此道歉。 为了清楚地表明,差异的定义可能并不那么明确,我需要选择至少分开60分钟的所有记录(对于每个站点),这样不仅是那些严格及时的记录。 具体地,
> out
site time
1 A 100#included because difference between 2 and 1 is>60
2 A 180#included because difference between 3 and 2 is>60
3 A 245#included because separated by 6o minutes before record#2
4 B 5#included because difference between 6 and 4 is>60
6 B 130#included because separated by 6o minutes before record#4
7 C 70#included because difference between 9 and 7 is>60
9 C 160#included because separated by 60 minutes before record#7
可能是为了解决“问题”,考虑差异的结果可能是有用的,如下所示:
> difference
values ind
1 80 A#include record 1 and 2
2 65 A#include record 2 and 3
3 50 B#include only record 4
4 75 B#include record 6 because there are(50+75)>60 m from r#4
5 50 C#include only record 7
6 40 C#include record 9 because there are (50+40)>60 m from r#7
感谢您的帮助。
答案 0 :(得分:3)
data[ave(data$time, data$site, FUN = function(x){c(61, diff(x)) > 60}) == 1, ]
# site time
# 1 A 100
# 2 A 180
# 3 A 245
# 4 B 5
# 6 B 130
# 7 C 70
更新后的问题修改:
keep <- as.logical(ave(data$time, data$site, FUN = function(x){
c(TRUE, cumsum(diff(x)) > 60)
}))
data[keep, ]
# site time
# 1 A 100
# 2 A 180
# 3 A 245
# 4 B 5
# 6 B 130
# 7 C 70
# 9 C 160
答案 1 :(得分:1)
#Calculate the differences
data$diff <- unlist(by(data$time, data$site,function(x)c(NA,diff(x))))
#subset data
data[is.na(data$diff) | data$diff > 60,]
答案 2 :(得分:0)
使用plyr
:
ddply(dat,.(site),function(x)x[c(TRUE , diff(x$time) >60),])