我正在使用R创建占用模型遭遇历史记录。我需要列出单个leks的鸟类数量列表,按年份分开,然后将计数日期编码为两个间隔,在第一次计数的10天内(间隔1)或第一次计数后的10天之后(间隔) 2)。对于只发生1次计数的任何一年,我需要添加一个编码为“U”的条目,以表示在第二个间隔期间没有发生计数。接下来,我需要将每年和间隔中的最大计数进行子集化。样本数据集:
ComplexId Date Males Year category
57 1941-04-15 97 1941 A
57 1942-04-15 67 1942 A
57 1943-04-15 44 1943 A
57 1944-04-15 32 1944 A
57 1946-04-15 21 1946 A
57 1947-04-15 45 1947 A
57 1948-04-15 67 1948 A
57 1989-03-21 25 1989 A
57 1989-03-30 41 1989 A
57 1989-04-13 2 1989 A
57 1991-03-06 35 1991 A
57 1991-04-04 43 1991 A
57 1991-04-11 37 1991 A
57 1991-04-22 25 1991 A
57 1993-03-23 6 1993 A
57 1994-03-06 17 1994 A
57 1994-03-11 10 1994 A
57 1994-04-06 36 1994 A
57 1994-04-15 29 1994 A
57 1994-04-21 27 1994 A
现在这里是我为完成任务而编写的代码,命名“c1”上方的数据框(您需要强制执行日期列到日期,并将类别列强制转换为字符):
c1_Year<-lapply(unique(c1$Year), function(x) c1[c1$Year == x,]) #splits complex counts into list by year
for(i in 1:length(c1_Year)){
c1_Year[[i]]<-cbind(c1_Year[[i]], daydiff = as.numeric(c1_Year[[i]][,2]-c1_Year[[i]][1,2]))
} #adds column with difference between first survey and subsequent surveys
for(i in 1:length(c1_Year)){
c1_Year[[i]]<-if(length(c1_Year[[i]][,1]) == 1)
rbind(c1_Year[[i]], c(c1_Year[[i]][1,1], NA, 0, c1_Year[[i]][1,4], "U", 11))
} # adds U values to years with only 1 count, while coercing the "u" into the appropriate interval
for(i in 1:length(c1_Year)){
c1_Year[[i]]$Interval<- ifelse(c1_Year[[i]][,6] < 10, 1, 2)
} # adds interval code for each survey, 1 = less than ten days after first count, 2 = more than 2 days after count
for(i in 1:length(c1_Year)){
c1_Year[[i]]<-ddply(.data=c1_Year[[i]], .(Interval), subset, Males==max(Males))
} # subsets out max count in each interval
问题出现在第二个for循环中,当options(error=recover)
启用时返回:
Error in c1_Year[[i]] : subscript out of bounds
No suitable frames for recover()
`
此时代码完成了它应该实现的目的,并且每年只用一个计数添加额外的行,即使生成错误消息,带有“U”代码的额外行仍然附加到数据帧。问题是我有750个leks来做这件事。所以我尝试将上面的代码构建到一个函数中,但是当我在任何数据上运行该函数时,下标超出边界错误会使函数无法运行。我可以强制它,只需手动为每个lek运行上面的代码,但我希望可能有更优雅的解决方案。我需要知道的是为什么我得到下标超出界限错误,我该如何解决?
这是我写的函数,所以你可以看到它不起作用:
create.OEH<-function(dataset, final_dataframe){
c1_Year<-lapply(unique(dataset$Year), function(x) dataset[dataset$Year == x,]) #splits complex counts into list by year
for(i in 1:length(c1_Year)){
c1_Year[[i]]<-cbind(c1_Year[[i]], daydiff = as.numeric(c1_Year[[i]][,2]-c1_Year[[i]][1,2]))
} #adds column with difference between first survey and subsequent surveys
for(i in 1:length(c1_Year)){
c1_Year[[i]]<-if(length(c1_Year[[i]][,1]) == 1)
rbind(c1_Year[[i]], c(c1_Year[[i]][1,1], NA, 0, c1_Year[[i]][1,4], "U", 11))
} # adds U values to years with only 1 count,
for(i in 1:length(c1_Year)){
c1_Year[[i]]$Interval<- ifelse(c1_Year[[i]][,6] < 10, 1, 2)
} # adds interval code for each survey, 1 = less than ten days after first count, 2 = more than 2 days after count
for(i in 1:length(c1_Year)){
c1_Year[[i]]<-ddply(.data=c1_Year[[i]], .(Interval), subset, Males==max(Males))
} #subset out max count for each interval
df<-rbind.fill(c1_Year) #collapse list into single dataframe
final_dataframe<-df[!duplicated(df[,c("Year", "Interval")]),] #remove ties for max count
}
答案 0 :(得分:0)
在这段代码中
for(i in 1:length(c1_Year)){
c1_Year[[i]]<-if(length(c1_Year[[i]][,1]) == 1)
rbind(c1_Year[[i]], c(c1_Year[[i]][1,1], NA, 0, c1_Year[[i]][1,4], "U", 11))
}
如果length(c1_Year[[i]][,1]==1
不为真,您将分配NULL,这会完全从c1_Year
中删除这些元素。
你可能想要
for(i in 1:length(c1_Year)){
if (length(c1_Year[[i]][,1]) == 1) {
c1_Year[[i]] <- rbind(c1_Year[[i]], c(c1_Year[[i]][1,1], NA, 0, c1_Year[[i]][1,4], "U", 11))
}
}
但是,我发现您已经在使用ddply
,因此您可以避免大量复制。
ddply(c1, .(Year), ...)
将c1
分成独特的年份。
c2 <- ddply(c1,
.(Year),
function (x) {
# create 'Interval'
x$Interval <- ifelse(x$Date - x$Date[1] < 10, 1, 2)
# extract max males per interval
o <- ddply(x, .(Interval), subset, Males==max(Males))
# add the 'U' col if no '2' interval
if (all(o$Interval != 2)) {
o <- rbind(o,
list(o$ComplexId, NA, 0, o$Year, 'U', 2))
}
# return the resulting dataframe
o
})
我将您的rbind(.., c(...))
转换为rbind(.., list(...))
以避免将所有内容转换回字符串(这是c
所做的,因为它无法处理多种不同类型)。
否则代码与您的代码几乎相同。