我正在尝试过滤包含错误的复制行的数据集并将其删除。我设法以另一种方式做到了这一点,但鉴于tidyverse的强大功能,我想尝试使用该软件包来做到这一点。
我使用的表格由3个地点(1,2,3)的4种物种的亲和力数据组成,每个地点都分为几部分(此处为1至5)。每个日期只能进行一次访问,这意味着,如果重复(同一站点,日期和部分的多个visit_id),将被删除。
我希望如何删除这些重复项,如下所示:
以下是数据:
dat<- data.frame(matrix(c(1,800681,1,"25-07-10",1,0,0,0,1,
1,800681,2,"25-07-10",NA,NA,NA,NA,NA,
1,800681,3,"25-07-10",NA,NA,NA,NA,NA,
1,800682,1,"25-07-10",NA,NA,NA,NA,NA,
1,800682,2,"25-07-10",NA,NA,NA,NA,NA,
1,800682,3,"25-07-10",NA,NA,NA,NA,NA,
1,800683,1,"25-07-10",NA,NA,NA,NA,NA,
1,800683,2,"25-07-10",NA,NA,NA,NA,NA,
1,800683,3,"25-07-10",NA,NA,NA,NA,NA,
1,800684,1,"25-07-10",NA,NA,NA,NA,NA,
1,800684,2,"25-07-10",NA,NA,NA,NA,NA,
1,800684,3,"25-07-10",NA,NA,NA,NA,NA,
1,800685,1,"25-07-10",NA,NA,NA,NA,NA,
1,800685,2,"25-07-10",NA,NA,NA,NA,NA,
1,800685,3,"25-07-10",NA,NA,NA,NA,NA,
2,800681,1,"25-07-10",NA,NA,NA,NA,NA,
2,800681,2,"25-07-10",NA,NA,NA,NA,NA,
2,800682,1,"25-07-10",NA,NA,NA,NA,NA,
2,800682,2,"25-07-10",NA,NA,NA,NA,NA,
2,800683,1,"25-07-10",NA,NA,NA,NA,NA,
2,800683,2,"25-07-10",NA,NA,NA,NA,NA,
2,800684,1,"25-07-10",NA,NA,NA,NA,NA,
2,800684,2,"25-07-10",NA,NA,NA,NA,NA,
2,800685,1,"25-07-10",NA,NA,NA,NA,NA,
2,800685,2,"25-07-10",NA,NA,NA,NA,NA,
3,800681,1,"25-07-10",1,0,0,0,1,
3,800682,2,"25-07-10",NA,NA,NA,NA,NA,
3,800683,3,"25-07-10",NA,NA,NA,NA,NA,
3,800684,4,"25-07-10",NA,NA,NA,NA,NA,
3,800685,5,"25-07-10",NA,NA,NA,NA,NA), ncol=9,nrow = 30,byrow = T))
colnames(dat)=c("site","visit_id","section","visit_date","species1","species2","species3","species4","tot")
我玩过filter(if)和slice,但是我无法嵌套不同的请求。任何建议都将受到欢迎
我想要得到的最终结果是:
dat<- data.frame(matrix(c(1,800681,1,25-07-10,1,0,0,0,1,
1,800681,2,25-07-10,NA,NA,NA,NA,NA,
1,800681,3,25-07-10,NA,NA,NA,NA,NA,
2,800681,1,25-07-10,NA,NA,NA,NA,NA,
2,800681,2,25-07-10,NA,NA,NA,NA,NA,
3,800681,1,25-07-10,1,0,0,0,1,
3,800682,2,25-07-10,NA,NA,NA,NA,NA,
3,800683,3,25-07-10,NA,NA,NA,NA,NA,
3,800684,4,25-07-10,NA,NA,NA,NA,NA,
3,800685,5,25-07-10,NA,NA,NA,NA,NA), ncol=9,nrow = 10,byrow = T))
colnames(dat)<-c("site","visit_id","section","visit_date","species1","species2","species3","species4","tot")
我无法将这两个操作合并在一起。这是我所做的: list_visit_id = as.vector(唯一(dat $ visit_id [!is.na(dat $ tot)]))
在这里,我只选择没有错误的观察结果:
heredat1=dat %>% group_by(site,visit_date,section)%>%
filter(length(visit_id)<2)
在复制的行下面,我仅选择1个可获得观察数据的行:
dat3=dat %>% group_by(site,visit_date,section)%>%
filter(length(visit_id)>1,visit_id %in% list_visit_id)