Question

示例数据：

      year <- rep(1980:2015, each = 365) 
      doy <- rep(1:365, times = 36)

      set.seed(125) 
      val <- sample(0:1, size = 365*36,replace = TRUE) 
      mat <- as.matrix(cbind(year,doy,val))

mat是包含年，doy和值列的数据，该列为1或0。

我有另一个矩阵mat1。

      set.seed(123) 
      mat1 <- apply(matrix(sample(c(230:365), replace = TRUE, size = 4L * 36L), nrow = 36L), 2L, sort)
      mat1 <- t(apply(mat1, 1, function(x) x[order(x)]))
      head(mat1)

            [,1] [,2] [,3] [,4]
      [1,]  230  231  233  236
      [2,]  235  238  242  242
      [3,]  236  242  243  246
      [4,]  243  245  247  249
      [5,]  247  248  249  250
      [6,]  249  250  253  263

mat1在mat中对应年份也有36行。mat1每年包含四列中的一些。

我希望每年使用mat从mat1分组。例如，1980年的mat就是分为三组：

      group 1 from 230 till 231 (1st and second column of row 1 from `mat1`)
      group 2 from 232 till 233 (second column + 1 to third column of row 1 from `mat1`)
      group 3 from 234 till 236 (third column + 1 to fourth column of row 1 from `mat1`)

这将为我提供1980年的三个向量。我想对每个向量做一个rle以找到最长的向量连续出现1.像

这样的东西

       group1.rle <- rle(group1)
       group2.rle <- rle(group2)
       group3.rle <- rle(group3)

       max(group1.rle$lengths[group1.rle$values == 1])
       max(group2.rle$lengths[group2.rle$values == 1])
       max(group3.rle$lengths[group3.rle$values == 1])

然后重复这一年mat。

谢谢。

Answer 1

这可能有点过头了，但我在tidyverse中被带走了（谁不知道那种感觉？;））

套餐

# You might as well go with library(tidyverse)
library(dplyr)
library(purrr)
library(tidyr)

代码＆amp;说明

# Preparation mat <- as.data.frame(mat) colnames(mat1) <- c("D1", "D2", "D3", "D4") mat1 <- cbind(year = 1980:2015, mat1) mat1 <- as.data.frame(mat1)

我为mat1的列命名，添加列year并将mat和mat1转换为不错的tibble，因此我使用{{1 }} 在他们。这对于获得正确年份的正确指标列非常重要。

left_join

首先加入两个mat_new <- mat %>% left_join(mat1, by = "year") %>% mutate(group1 = (doy >= D1 & doy <=D2), group2 = (doy >= D2 + 1 & doy <=D3), group3 = (doy >= D3 + 1 & doy <=D4)) mat_new # A tibble: 13,140 x 10 # year doy val D1 D2 D3 D4 group1 group2 group3 # <int> <int> <int> <int> <int> <int> <int> <lgl> <lgl> <lgl> # 1 1980 1 1 230 231 233 236 FALSE FALSE FALSE # 2 1980 2 0 230 231 233 236 FALSE FALSE FALSE # 3 1980 3 0 230 231 233 236 FALSE FALSE FALSE # 4 1980 4 0 230 231 233 236 FALSE FALSE FALSE # 5 1980 5 1 230 231 233 236 FALSE FALSE FALSE # 6 1980 6 1 230 231 233 236 FALSE FALSE FALSE # 7 1980 7 1 230 231 233 236 FALSE FALSE FALSE # 8 1980 8 0 230 231 233 236 FALSE FALSE FALSE # 9 1980 9 1 230 231 233 236 FALSE FALSE FALSE # 10 1980 10 1 230 231 233 236 FALSE FALSE FALSE # ... with 13,130 more rows，然后添加组列，tibble是否在子集中。

doy

将组列聚集在一起，然后收集mat_new <- mat_new %>% gather(group, indicator, group1:group3) %>% nest(doy, val, indicator) mat_new # A tibble: 108 x 7 # year D1 D2 D3 D4 group data # <int> <int> <int> <int> <int> <chr> <list> # 1 1980 230 231 233 236 group1 <tibble [365 x 3]> # 2 1981 235 238 242 242 group1 <tibble [365 x 3]> # 3 1982 236 242 243 246 group1 <tibble [365 x 3]> # 4 1983 243 245 247 249 group1 <tibble [365 x 3]> # 5 1984 247 248 249 250 group1 <tibble [365 x 3]> # 6 1985 249 250 253 263 group1 <tibble [365 x 3]> # 7 1986 250 250 255 269 group1 <tibble [365 x 3]> # 8 1987 255 258 259 269 group1 <tibble [365 x 3]> # 9 1988 259 259 263 274 group1 <tibble [365 x 3]> # 10 1989 261 270 273 285 group1 <tibble [365 x 3]> # ... with 98 more rows数据。现在每行包含一年 - 组合，数据列存储此特定组合的列nest，doy和val。这样可以更轻松地计算下一步中indicator的所有年份 - 组合。

rle

通过两次拨打mat_new <- mat_new %>% mutate(group.rle = map(data, ~ .x %>% filter(indicator) %>% pull(val) %>% rle), max.group.rle = map_dbl(group.rle, ~max(.x$lengths[.x$values == 1]))) mat_new # A tibble: 108 x 9 # year D1 D2 D3 D4 group data group.rle max.group.rle # <int> <int> <int> <int> <int> <chr> <list> <list> <dbl> # 1 1980 230 231 233 236 group1 <tibble [365 x 3]> <S3: rle> 1 # 2 1981 235 238 242 242 group1 <tibble [365 x 3]> <S3: rle> 2 # 3 1982 236 242 243 246 group1 <tibble [365 x 3]> <S3: rle> 1 # 4 1983 243 245 247 249 group1 <tibble [365 x 3]> <S3: rle> 1 # 5 1984 247 248 249 250 group1 <tibble [365 x 3]> <S3: rle> -Inf # 6 1985 249 250 253 263 group1 <tibble [365 x 3]> <S3: rle> 1 # 7 1986 250 250 255 269 group1 <tibble [365 x 3]> <S3: rle> 1 # 8 1987 255 258 259 269 group1 <tibble [365 x 3]> <S3: rle> 2 # 9 1988 259 259 263 274 group1 <tibble [365 x 3]> <S3: rle> -Inf # 10 1989 261 270 273 285 group1 <tibble [365 x 3]> <S3: rle> 2 # ... with 98 more rows，我们可以获得每组的最大值。在第一次调用中，map列中的每个tibble都会按data列中存储的值进行过滤，然后提取indicator列（val }）然后最终将pull应用于这些值。

在第二次rle来电中，map列中存储的rle会根据您的条件（仅值1的长度）进行过滤，并计算group.rle。由于这会返回长度为1的数字向量，因此我使用max直接存储结果。

注意，此调用将产生警告，因为并非所有组都包含值1，因此在过滤后最大值没有非缺失参数。

map_dbl

为了更好地查看结果，我只选择列mat_new %>% select(year, group, max.group.rle) %>% spread(group, max.group.rle) # A tibble: 36 x 4 # year group1 group2 group3 # * <int> <dbl> <dbl> <dbl> # 1 1980 1 -Inf 1 # 2 1981 2 1 -Inf # 3 1982 1 -Inf -Inf # 4 1983 1 -Inf 1 # 5 1984 -Inf 1 -Inf # 6 1985 1 -Inf 2 # 7 1986 1 3 1 # 8 1987 2 -Inf 1 # 9 1988 -Inf -Inf 2 # 10 1989 2 1 3 # # ... with 26 more rows，year，group，然后使用max.group.rle将这些列分散到不同的列中。现在我们每年有一行提供相关信息。

R：基于另一个矩阵并运行rle的子集矩阵

1 个答案: