在R中按年份分组数据并按月份过滤

时间:2020-10-08 17:47:37

标签: r database dataframe time

我有一个包含每日流量数据的数据帧列表。

我想为列表中每个数据帧(分别对应于站点中的数据)估计每年六月至十一月的最大每日流量。

这是数据帧列表的外观:

enter image description here

这是我正在使用的代码:

#Peak mean daily flow summer and fall (June to November)
PeakflowSummerFall <- lapply(listDF,function(x){x %>% group_by(x %>% mutate(year = year(Date))) 
                                                  %>% filter((x %>% mutate(month = month(Date)) >= 6) & (x %>% mutate(month = month(Date)) <= 11)) 
                                                  %>% summarise(max=max(DailyStreamflow, na.rm =TRUE))})

但我遇到此错误:

<error/dplyr_error>
Problem with `filter()` input `..1`.
x Input `..1` must be of size 1, not size 24601.
i Input `..1` is `&...`.
i The error occurred in group 1: Date = 1953-06-01, DailyStreamflow = 32, year = 1953.
Backtrace:
Run `rlang::last_trace()` to see the full context

对这个问题有什么解决办法吗?

2 个答案:

答案 0 :(得分:0)

#### This should give provide you with enough 
#### sample data for answerers to work with

install.packages('purrr')
library(purrr)

sample_dat <- listDF %>%
  head %>%
  map( ~ head(.x))

dput(sample_dat)

#### With that being said...
#### You should flatten the data frame... 
#### It's easier to work with...
install.packages('lubridate')
library(lubridate)

listDF %>%
  plyr::ldply(rbind) %>%
  mutate(month = floor_date(Date, unit = 'month')) %>%
  filter(month(Date) > 5, month(Date) < 12) %>%
  group_by(.id, month) %>%
  dplyr::summarise(max_flow = max(DailyStreamflow)) %>%
  split(.$.id)
     

答案 1 :(得分:0)

鉴于已发布的数据结构图片,以下方法可能有效。

library(lubridate)
library(dplyr)

listDF %>%
  purrr::map(function(x){
    x %>%
      filter(month(Date) >= 6 & month(Date) <= 11) %>%
      group_by(year(Date)) %>%
      summarise(Max = max(DailyStreamflow, na.rm = TRUE), .groups = "keep")
    
  })

测试数据创建代码。

fun <- function(year, n){
  d1 <- as.Date(paste(year, 1, 1, sep = "-"))
  d2 <- as.Date(paste(year + 10, 12, 31, sep = "-"))
  d <- seq(d1, d2, by = "day")
  d <- sort(rep(sample(d, n, TRUE), length.out = n))
  flow <- sample(10*n, n, TRUE)
  data.frame(Date = d, DailyStreamflow = flow)
}

set.seed(2020)
listDF <- lapply(1:3, function(i) fun(c(1953, 1965, 1980)[i], c(24601, 13270, 17761)[i]))
str(listDF)
rm(fun)