Question

我试图从多年来的气候变量（每日价值）数据集中提取信息。雪的存在和不存在在数据集中被编码为1和0。每行对应一天，我有一个日期和半年的列cody。在每半年期间，我试图找到最后一次出现雪后的日期值（1）。我通过聚合对变量进行分组来尝试这一点，但是自定义函数的使用没有成功，因为“aggregate”似乎传递了没有名称的数据帧的列。

dates<-c("1993-01-01","1993-01-02","1993-01-03","1993-01-04","1993-01-05","1994-02-20","1994-02-21","1994-02-22","1994-02-23","1994-02-24")
df<-data.frame(Date=as.Date(dates,format = "%Y-%m-%d"),
               halves=as.factor(c(1993-01-01,1993-01-01,1993-01-01,1993-01-01,1993-01-01,1994-01-01,1994-01-01,1994-01-01,1994-01-01,1994-01-01)),
               plot1=c(1,1,1,0,0,1,1,0,0,0),
               plot2=c(1,1,0,0,0,1,1,0,1,0),
               plot3=c(0,1,1,1,0,1,1,1,0,0))

我知道使用循环不是那么有效，但是因为我想避免将函数应用于我使用的“not-plot-columns”：

for(plots in names(df)[- which(names(df) %in% c("Date","halves"))]){        
     meltday[[plots<-aggregate(df[[plots]]~halves,df,df$Date[last(which(snow.days.half$Date==0)) + 1])  
}

这会产生错误，因为最后一部分未被评估为函数。所以我尝试使用自制函数在所有绘图列中找到最后一次出现的日期+ 1。

snowmelt<-function(x)
{snowmelt<-max(x[[Date]][x[[plots]]==1])
 snowmelt}

然后尝试

for(plots in names(df)[- which(names(df) %in% c("Date","halves"))]){        
  meltday[[plots]]<-aggregate(df[[plots]] ~ halves,df,snowmelt)  
}

给我留下了一个错误声明，因为该函数将其输入强制转换为列表。

所以，我非常感到困惑，对于任何能够指引我正确方向的评论或答案，我都会非常感激和感激。我想要的输出是一个带有日期的数据框：沿着

行

   >meltday
   halves         plot1       plot2         plot3
   1993-01-04     1993-01-04  1993-01-03    1993-01-05
   1994-01-01     1994-02-22  1994-02-24    1994-02-23

编辑：添加所需的输出以进行说明。

谢谢！

Answer 1

使用tidyr和dplyr

我更改了在OP中输入halves变量以使用字符向量的方式（我认为这可能是一个错误）

dates<-c("1993-01-01","1993-01-02","1993-01-03","1993-01-04","1993-01-05","1994-02-20","1994-02-21","1994-02-22","1994-02-23","1994-02-24")
df<-data.frame(Date=as.Date(dates,format = "%Y-%m-%d"),
               halves=as.factor(c('1993-01-01','1993-01-01','1993-01-01','1993-01-01','1993-01-01','1994-01-01','1994-01-01','1994-01-01','1994-01-01','1994-01-01')),
               plot1=c(1,1,1,0,0,1,1,0,0,0),
               plot2=c(1,1,0,0,0,1,1,0,1,0),
               plot3=c(0,1,1,1,0,1,1,1,0,0))

然后我使用gather命令将数据放入长格式，然后我将其转换为分组数据框，summarise函数可以为每个plot聚合}。我使用spread将数据恢复为宽格式。

df %>%
gather(plot, snow, plot1:plot3) %>%
group_by(plot, halves) %>%
arrange(Date) %>%
summarise(meltday=Date[max(which(snow==T))+1]) %>%
spread(plot, meltday)

输出

# A tibble: 2 × 4
      halves      plot1      plot2      plot3
*     <fctr>     <date>     <date>     <date>
1 1993-01-01 1993-01-04 1993-01-03 1993-01-05
2 1994-01-01 1994-02-22 1994-02-24 1994-02-23

注意：如果没有发生熔化（即最终条目中有雪），这将会出错。我还没有想过用一种更简洁的方法来检测“最后的”。下雪天。

Answer 2

另一种方法是在最后一个下雪天之后定义以下函数来计算Date：

date.after.last.snow <- function(x, Date) {
  Date[tail(which(x==1),1)+1]
}

然后使用此功能汇总每个plot列group_by halves：

library(dplyr)
res <- df %>% group_by(halves) %>% 
              summarize_each(funs(date=date.after.last.snow(.,Date)),-Date)

使用您的数据的结果符合预期：

print(res)
### A tibble: 2 x 4
##      halves plot1_date plot2_date plot3_date
##      <fctr>     <date>     <date>     <date>
##1 1993-01-01 1993-01-04 1993-01-03 1993-01-05
##2 1994-01-01 1994-02-22 1994-02-24 1994-02-23

数据：

df <- structure(list(Date = structure(c(8401, 8402, 8403, 8404, 8405, 8816, 8817, 8818, 8819, 8820), class = "Date"), halves = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c("1993-01-01", "1994-01-01"), class = "factor"), plot1 = c(1, 1, 1, 0, 0, 1, 1, 0, 0, 0), plot2 = c(1, 1, 0, 0, 0, 1, 1, 0, 1, 0), plot3 = c(0, 1, 1, 1, 0, 1, 1, 1, 0, 0)), .Names = c("Date", "halves", "plot1", "plot2", "plot3"), row.names = c(NA, -10L), class = "data.frame") ## Date halves plot1 plot2 plot3 ##1 1993-01-01 1993-01-01 1 1 0 ##2 1993-01-02 1993-01-01 1 1 1 ##3 1993-01-03 1993-01-01 1 0 1 ##4 1993-01-04 1993-01-01 0 0 1 ##5 1993-01-05 1993-01-01 0 0 0 ##6 1994-02-20 1994-01-01 1 1 1 ##7 1994-02-21 1994-01-01 1 1 1 ##8 1994-02-22 1994-01-01 0 0 1 ##9 1994-02-23 1994-01-01 0 1 0 ##10 1994-02-24 1994-01-01 0 0 0

将“聚合”与用户自定义功能相结合

2 个答案: