R - For循环和如果数据框列表上的语句错误:下标超出界限

时间:2015-07-17 00:54:07

标签: r

我正在使用R创建占用模型遭遇历史记录。我需要列出单个leks的鸟类数量列表,按年份分开,然后将计数日期编码为两个间隔,在第一次计数的10天内(间隔1)或第一次计数后的10天之后(间隔) 2)。对于只发生1次计数的任何一年,我需要添加一个编码为“U”的条目,以表示在第二个间隔期间没有发生计数。接下来,我需要将每年和间隔中的最大计数进行子集化。样本数据集:

 ComplexId       Date Males Year category
        57 1941-04-15    97 1941        A
        57 1942-04-15    67 1942        A
        57 1943-04-15    44 1943        A
        57 1944-04-15    32 1944        A
        57 1946-04-15    21 1946        A
        57 1947-04-15    45 1947        A
        57 1948-04-15    67 1948        A
        57 1989-03-21    25 1989        A
        57 1989-03-30    41 1989        A
        57 1989-04-13     2 1989        A
        57 1991-03-06    35 1991        A
        57 1991-04-04    43 1991        A
        57 1991-04-11    37 1991        A
        57 1991-04-22    25 1991        A
        57 1993-03-23     6 1993        A
        57 1994-03-06    17 1994        A
        57 1994-03-11    10 1994        A
        57 1994-04-06    36 1994        A
        57 1994-04-15    29 1994        A
        57 1994-04-21    27 1994        A

现在这里是我为完成任务而编写的代码,命名“c1”上方的数据框(您需要强制执行日期列到日期,并将类别列强制转换为字符):

c1_Year<-lapply(unique(c1$Year), function(x) c1[c1$Year == x,]) #splits complex counts into list by year

for(i in 1:length(c1_Year)){
  c1_Year[[i]]<-cbind(c1_Year[[i]], daydiff = as.numeric(c1_Year[[i]][,2]-c1_Year[[i]][1,2]))
} #adds column with difference between first survey and subsequent surveys

for(i in 1:length(c1_Year)){
  c1_Year[[i]]<-if(length(c1_Year[[i]][,1]) == 1)
    rbind(c1_Year[[i]], c(c1_Year[[i]][1,1], NA, 0, c1_Year[[i]][1,4], "U", 11)) 
} # adds U values to years with only 1 count,  while coercing the "u" into the appropriate interval

for(i in 1:length(c1_Year)){
  c1_Year[[i]]$Interval<- ifelse(c1_Year[[i]][,6] < 10, 1, 2)
} # adds interval code for each survey, 1 = less than ten days after first count, 2 = more than 2 days after count

for(i in 1:length(c1_Year)){
  c1_Year[[i]]<-ddply(.data=c1_Year[[i]], .(Interval), subset, Males==max(Males)) 
} # subsets out max count in each interval

问题出现在第二个for循环中,当options(error=recover)启用时返回: Error in c1_Year[[i]] : subscript out of bounds No suitable frames for recover() ` 此时代码完成了它应该实现的目的,并且每年只用一个计数添加额外的行,即使生成错误消息,带有“U”代码的额外行仍然附加到数据帧。问题是我有750个leks来做这件事。所以我尝试将上面的代码构建到一个函数中,但是当我在任何数据上运行该函数时,下标超出边界错误会使函数无法运行。我可以强制它,只需手动为每个lek运行上面的代码,但我希望可能有更优雅的解决方案。我需要知道的是为什么我得到下标超出界限错误,我该如何解决?

这是我写的函数,所以你可以看到它不起作用:

create.OEH<-function(dataset, final_dataframe){
  c1_Year<-lapply(unique(dataset$Year), function(x) dataset[dataset$Year == x,]) #splits complex counts into list by year

  for(i in 1:length(c1_Year)){
    c1_Year[[i]]<-cbind(c1_Year[[i]], daydiff = as.numeric(c1_Year[[i]][,2]-c1_Year[[i]][1,2]))
  } #adds column with difference between first survey and subsequent surveys

  for(i in 1:length(c1_Year)){
    c1_Year[[i]]<-if(length(c1_Year[[i]][,1]) == 1)
      rbind(c1_Year[[i]], c(c1_Year[[i]][1,1], NA, 0, c1_Year[[i]][1,4], "U", 11)) 
  } # adds U values to years with only 1 count,

  for(i in 1:length(c1_Year)){
    c1_Year[[i]]$Interval<- ifelse(c1_Year[[i]][,6] < 10, 1, 2)
  } # adds interval code for each survey, 1 = less than ten days after first count, 2 = more than 2 days after count

  for(i in 1:length(c1_Year)){
    c1_Year[[i]]<-ddply(.data=c1_Year[[i]], .(Interval), subset, Males==max(Males)) 
  } #subset out max count for each interval

  df<-rbind.fill(c1_Year) #collapse list into single dataframe

  final_dataframe<-df[!duplicated(df[,c("Year", "Interval")]),] #remove ties for max count

}

1 个答案:

答案 0 :(得分:0)

在这段代码中

for(i in 1:length(c1_Year)){
    c1_Year[[i]]<-if(length(c1_Year[[i]][,1]) == 1)
      rbind(c1_Year[[i]], c(c1_Year[[i]][1,1], NA, 0, c1_Year[[i]][1,4], "U", 11)) 
  } 

如果length(c1_Year[[i]][,1]==1不为真,您将分配NULL,这会完全从c1_Year中删除这些元素。

你可能想要

for(i in 1:length(c1_Year)){
    if (length(c1_Year[[i]][,1]) == 1) {
        c1_Year[[i]] <- rbind(c1_Year[[i]], c(c1_Year[[i]][1,1], NA, 0, c1_Year[[i]][1,4], "U", 11)) 
    }
  } 

但是,我发现您已经在使用ddply,因此您可以避免大量复制。 ddply(c1, .(Year), ...)c1分成独特的年份。

c2 <- ddply(c1,
            .(Year),
            function (x) {
                # create 'Interval'
                x$Interval <- ifelse(x$Date - x$Date[1] < 10, 1, 2)
                # extract max males per interval
                o <- ddply(x, .(Interval), subset, Males==max(Males))
                # add the 'U' col if no '2' interval
                if (all(o$Interval != 2)) {
                    o <- rbind(o,
                               list(o$ComplexId, NA, 0, o$Year, 'U', 2))
                }
                # return the resulting dataframe
                o
            })

我将您的rbind(.., c(...))转换为rbind(.., list(...))以避免将所有内容转换回字符串(这是c所做的,因为它无法处理多种不同类型)。

否则代码与您的代码几乎相同。