跨列中的行的条件序列

时间:2018-03-28 00:27:17

标签: r dplyr time-series

如果df$a中有“6”,我希望从前一个9月到明年5月的1:9位于新列中,此处显示为df$b,其余为NA

library(tidyverse)
library(lubridate)
date <- c("2/29/1940","3/31/1940","4/30/1940","5/31/1940","6/30/1940","7/31/1940","8/31/1940","9/30/1940","10/31/1940","11/30/1940","12/31/1940","1/31/1941","2/28/1941",
       "3/31/1941","4/30/1941","5/31/1941","6/30/1941","7/31/1941","8/31/1941","9/30/1941","10/31/1941","11/30/1941", "12/31/1941","1/31/1942","2/28/1942","3/31/1942",
       "4/30/1942","5/31/1942", "6/30/1942","7/31/1942","8/31/1942","9/30/1942","10/31/1942","11/30/1942","12/31/1942","1/31/1943","2/28/1943","3/31/1943","4/30/1943",
       "5/31/1943","6/30/1943","7/31/1943", "8/31/1943","9/30/1943")
a <- c("NA","NA","NA","NA","NA","NA","NA","NA","NA","NA","NA","NA",6,"NA","NA","NA","NA","NA","NA","NA","NA","NA","NA","NA","NA","NA","NA","NA","NA","NA","NA","NA","NA",
   "NA","NA","NA","NA","NA","NA","NA","NA","NA","NA","NA")
df <- data.frame(date, a)
df %<>% mutate(date = mdy(date), a)

df:
date        a   b
2/29/1940   NA  NA
3/31/1940   NA  NA
4/30/1940   NA  NA
5/31/1940   NA  NA
6/30/1940   NA  NA
7/31/1940   NA  NA
8/31/1940   NA  NA
9/30/1940   NA  1
10/31/1940  NA  2
11/30/1940  NA  3
12/31/1940  NA  4
1/31/1941   NA  5
2/28/1941   6   6
3/31/1941   NA  7
4/30/1941   NA  8
5/31/1941   NA  9
6/30/1941   NA  NA
7/31/1941   NA  NA
8/31/1941   NA  NA
9/30/1941   NA  NA
10/31/1941  NA  NA
11/30/1941  NA  NA
12/31/1941  NA  NA
1/31/1942   NA  NA
2/28/1942   NA  NA
3/31/1942   NA  NA
4/30/1942   NA  NA
5/31/1942   NA  NA
6/30/1942   NA  NA
7/31/1942   NA  NA
8/31/1942   NA  NA
9/30/1942   NA  NA
10/31/1942  NA  NA
11/30/1942  NA  NA
12/31/1942  NA  NA
1/31/1943   NA  NA
2/28/1943   NA  NA
3/31/1943   NA  NA
4/30/1943   NA  NA
5/31/1943   NA  NA
6/30/1943   NA  NA
7/31/1943   NA  NA
8/31/1943   NA  NA
9/30/1943   NA  NA

对于更多上下文,我在数据框中有一百年左右的月度数据,我正在寻找一种有效的方法来生成给定前两列的第三列,以处理/可视化未显示的其他数据。有时只有df$a的二月有6。在这样的时候,我希望在新的专栏中显示前一个九月到下一个五月(我希望生成df$b)。我尝试了一些笨拙的方法,主要是通过一系列mutate()lag()lead()变化的行,但感觉有更多的直接路线。

谢谢,

戴夫

1 个答案:

答案 0 :(得分:1)

使用中的case_whenleadlag的解决方案。它不是最简洁的解决方案,但是当它接近边缘时它将起作用。

library(tidyverse)

df2 <- df %>%
  mutate(b = case_when(
    lead(a, n = 5L) == 6     ~1,
    lead(a, n = 4L) == 6     ~2,
    lead(a, n = 3L) == 6     ~3,
    lead(a, n = 2L) == 6     ~4,
    lead(a, n = 1L) == 6     ~5,
                  a == 6     ~6,
     lag(a, n = 1L) == 6     ~7,
     lag(a, n = 2L) == 6     ~8,
     lag(a, n = 3L) == 6     ~9,
    TRUE                     ~NA_real_
  ))

数据

请注意,我更改了您在A列中指定NA的方式。

library(lubridate)
date <- c("2/29/1940","3/31/1940","4/30/1940","5/31/1940","6/30/1940","7/31/1940","8/31/1940","9/30/1940","10/31/1940","11/30/1940","12/31/1940","1/31/1941","2/28/1941",
          "3/31/1941","4/30/1941","5/31/1941","6/30/1941","7/31/1941","8/31/1941","9/30/1941","10/31/1941","11/30/1941", "12/31/1941","1/31/1942","2/28/1942","3/31/1942",
          "4/30/1942","5/31/1942", "6/30/1942","7/31/1942","8/31/1942","9/30/1942","10/31/1942","11/30/1942","12/31/1942","1/31/1943","2/28/1943","3/31/1943","4/30/1943",
          "5/31/1943","6/30/1943","7/31/1943", "8/31/1943","9/30/1943")
a <- c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA , 6, NA , NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA , NA , NA , NA , NA , NA , NA,
       NA, NA, NA , NA , NA , NA , NA , NA , NA , NA , NA)
df <- data.frame(date, a)
df %<>% mutate(date = mdy(date), a)