我有一个数据框,其中包含多个采样间隔的多个站点的标记个体。见下面的例子:
> df
Tag Site Interval Ind_ID
1 507 Golden 7 1
2 507 Golden 8 1
3 552 Golden 2 1
4 552 Golden 1 1
5 847 Golden 4 1
6 847 Golden 6 1
8 847 Golden 5 1
9 847 Golden 3 1
31 541 Golden 1 1
33 541 Golden 3 1
34 541 Golden 4 1
35 541 Golden 7 1
36 541 Golden 6 1
37 541 Golden 5 1
39 810 Golden 7 1
40 810 Golden 8 1
41 840 Golden 7 1
42 840 Golden 8 1
43 840 Golden 3 1
44 840 Golden 2 1
我尝试做的是按时间间隔标记被标记的个体,我已经使用此for循环完成:
for (i in 1:nlevels(factor(df$Interval))){
I<-subset(df,Interval==levels(factor(df$Interval))[i])
assign(paste("Interval_", i, sep = ""), I)}
然后按顺序合并数据帧,我目前正在使用此代码进行合并:
IPl2<-merge(Interval_1, Interval_2, by=c("Tag", "Site", "Ind_ID"))
IPl3<-merge(Interval_2, Interval_3, by=c("Tag", "Site", "Ind_ID"))
IPl4<-merge(Interval_3, Interval_4, by=c("Tag", "Site", "Ind_ID"))
IPl5<-merge(Interval_4, Interval_5, by=c("Tag", "Site", "Ind_ID"))
IPl6<-merge(Interval_5, Interval_6, by=c("Tag", "Site", "Ind_ID"))
IPl7<-merge(Interval_6, Interval_7, by=c("Tag", "Site", "Ind_ID"))
IPl8<-merge(Interval_7, Interval_8, by=c("Tag", "Site", "Ind_ID"))
我确信这是一种更有效的方式。此外,我不断向数据集添加数据(即更多间隔),我希望每次添加新数据时都不必编辑代码。有什么想法吗?
答案 0 :(得分:0)
也许是这样的:
dfs <- split(df,df$Interval)
n <- nlevels(factor(df$Interval))-1
results <- setNames(vector("list",length = n),paste0("IPl",2:(n+1)))
for (i in seq_len(n)){
results[[i]] <- merge(dfs[[i]],dfs[[i+1]],by = c('Tag','Site','Ind_ID'))
}
> head(results)
$IPl2
Tag Site Ind_ID Interval.x Interval.y
1 552 Golden 1 1 2
$IPl3
Tag Site Ind_ID Interval.x Interval.y
1 840 Golden 1 2 3
$IPl4
Tag Site Ind_ID Interval.x Interval.y
1 541 Golden 1 3 4
2 847 Golden 1 3 4
$IPl5
Tag Site Ind_ID Interval.x Interval.y
1 541 Golden 1 4 5
2 847 Golden 1 4 5
$IPl6
Tag Site Ind_ID Interval.x Interval.y
1 541 Golden 1 5 6
2 847 Golden 1 5 6
$IPl7
Tag Site Ind_ID Interval.x Interval.y
1 541 Golden 1 6 7
答案 1 :(得分:0)
下面是一个dplyr
解决方案,它将数据框与自身连接起来,并将结果放入数据框中。
library(dplyr)
## Join the 'df' to itself based on the intervals to compare; this is done by
## creating a key to indicate which intervals to join on.
resultdf <-
## Create match_interval to next sequential value
df %>% mutate(match_interval = paste0('IPl', as.numeric(Interval)+1)) %>% arrange(Interval, Site) %>%
## Join to self by match_interval and other columns.
inner_join(df %>% mutate(match_interval = paste0('IPl', as.numeric(Interval))),
by = c('Tag', 'Site', 'Ind_ID', 'match_interval')) %>%
## Order columns
select(match_interval, Tag, Site, Ind_ID, Interval.x, Interval.y)
resultsdf
## match_interval Tag Site Ind_ID Interval.x Interval.y
## 1 IPl2 552 Golden 1 1 2
## 2 IPl3 840 Golden 1 2 3
## 3 IPl4 847 Golden 1 3 4
## 4 IPl4 541 Golden 1 3 4
## 5 IPl5 847 Golden 1 4 5
## 6 IPl5 541 Golden 1 4 5
## 7 IPl6 847 Golden 1 5 6
## 8 IPl6 541 Golden 1 5 6
## 9 IPl7 541 Golden 1 6 7
## 10 IPl8 507 Golden 1 7 8
## 11 IPl8 810 Golden 1 7 8
## 12 IPl8 840 Golden 1 7 8