如果df$a
中有“6”,我希望从前一个9月到明年5月的1:9
位于新列中,此处显示为df$b
,其余为NA
。
library(tidyverse)
library(lubridate)
date <- c("2/29/1940","3/31/1940","4/30/1940","5/31/1940","6/30/1940","7/31/1940","8/31/1940","9/30/1940","10/31/1940","11/30/1940","12/31/1940","1/31/1941","2/28/1941",
"3/31/1941","4/30/1941","5/31/1941","6/30/1941","7/31/1941","8/31/1941","9/30/1941","10/31/1941","11/30/1941", "12/31/1941","1/31/1942","2/28/1942","3/31/1942",
"4/30/1942","5/31/1942", "6/30/1942","7/31/1942","8/31/1942","9/30/1942","10/31/1942","11/30/1942","12/31/1942","1/31/1943","2/28/1943","3/31/1943","4/30/1943",
"5/31/1943","6/30/1943","7/31/1943", "8/31/1943","9/30/1943")
a <- c("NA","NA","NA","NA","NA","NA","NA","NA","NA","NA","NA","NA",6,"NA","NA","NA","NA","NA","NA","NA","NA","NA","NA","NA","NA","NA","NA","NA","NA","NA","NA","NA","NA",
"NA","NA","NA","NA","NA","NA","NA","NA","NA","NA","NA")
df <- data.frame(date, a)
df %<>% mutate(date = mdy(date), a)
df:
date a b
2/29/1940 NA NA
3/31/1940 NA NA
4/30/1940 NA NA
5/31/1940 NA NA
6/30/1940 NA NA
7/31/1940 NA NA
8/31/1940 NA NA
9/30/1940 NA 1
10/31/1940 NA 2
11/30/1940 NA 3
12/31/1940 NA 4
1/31/1941 NA 5
2/28/1941 6 6
3/31/1941 NA 7
4/30/1941 NA 8
5/31/1941 NA 9
6/30/1941 NA NA
7/31/1941 NA NA
8/31/1941 NA NA
9/30/1941 NA NA
10/31/1941 NA NA
11/30/1941 NA NA
12/31/1941 NA NA
1/31/1942 NA NA
2/28/1942 NA NA
3/31/1942 NA NA
4/30/1942 NA NA
5/31/1942 NA NA
6/30/1942 NA NA
7/31/1942 NA NA
8/31/1942 NA NA
9/30/1942 NA NA
10/31/1942 NA NA
11/30/1942 NA NA
12/31/1942 NA NA
1/31/1943 NA NA
2/28/1943 NA NA
3/31/1943 NA NA
4/30/1943 NA NA
5/31/1943 NA NA
6/30/1943 NA NA
7/31/1943 NA NA
8/31/1943 NA NA
9/30/1943 NA NA
对于更多上下文,我在数据框中有一百年左右的月度数据,我正在寻找一种有效的方法来生成给定前两列的第三列,以处理/可视化未显示的其他数据。有时只有df$a
的二月有6。在这样的时候,我希望在新的专栏中显示前一个九月到下一个五月(我希望生成df$b
)。我尝试了一些笨拙的方法,主要是通过一系列mutate()
,lag()
和lead()
变化的行,但感觉有更多的直接路线。
谢谢,
戴夫
答案 0 :(得分:1)
使用dplyr中的case_when
,lead
和lag
的解决方案。它不是最简洁的解决方案,但是当它接近边缘时它将起作用。
library(tidyverse)
df2 <- df %>%
mutate(b = case_when(
lead(a, n = 5L) == 6 ~1,
lead(a, n = 4L) == 6 ~2,
lead(a, n = 3L) == 6 ~3,
lead(a, n = 2L) == 6 ~4,
lead(a, n = 1L) == 6 ~5,
a == 6 ~6,
lag(a, n = 1L) == 6 ~7,
lag(a, n = 2L) == 6 ~8,
lag(a, n = 3L) == 6 ~9,
TRUE ~NA_real_
))
数据强>
请注意,我更改了您在A列中指定NA
的方式。
library(lubridate)
date <- c("2/29/1940","3/31/1940","4/30/1940","5/31/1940","6/30/1940","7/31/1940","8/31/1940","9/30/1940","10/31/1940","11/30/1940","12/31/1940","1/31/1941","2/28/1941",
"3/31/1941","4/30/1941","5/31/1941","6/30/1941","7/31/1941","8/31/1941","9/30/1941","10/31/1941","11/30/1941", "12/31/1941","1/31/1942","2/28/1942","3/31/1942",
"4/30/1942","5/31/1942", "6/30/1942","7/31/1942","8/31/1942","9/30/1942","10/31/1942","11/30/1942","12/31/1942","1/31/1943","2/28/1943","3/31/1943","4/30/1943",
"5/31/1943","6/30/1943","7/31/1943", "8/31/1943","9/30/1943")
a <- c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA , 6, NA , NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA , NA , NA , NA , NA , NA , NA,
NA, NA, NA , NA , NA , NA , NA , NA , NA , NA , NA)
df <- data.frame(date, a)
df %<>% mutate(date = mdy(date), a)