我有一个带有美国NBER衰退的简单数据集,目前被编码为虚拟变量。我想依序标记每次衰退。例如,在下表中,我希望经济衰退列读取“经济衰退1”,“无经济衰退”,“经济衰退2”等,从而对每个经济衰退进行分类。
Date Recession
1949-06-30 1
1949-09-30 1
1949-12-31 1
1950-03-31 0
1950-06-30 0
1953-09-30 1
1953-12-31 1
答案 0 :(得分:3)
您可以使用rle
来计数1
的连续运行,并重复(rep
)相应次数(lengths
)
foo <- with(rle(input$Recession), rep(cumsum(values) * values, lengths))
ifelse(foo == 0, "No Recession", paste("Recession", foo))
输入:
structure(list(Date = c("1949-06-30", "1949-09-30", "1949-12-31",
"1950-03-31", "1950-06-30", "1953-09-30", "1953-12-31"), Recession = c(1L,
1L, 1L, 0L, 0L, 1L, 1L)), row.names = c(NA, -7L), class = "data.frame")
答案 1 :(得分:2)
这是一种整洁的方法:
lag
来确定衰退状态是否发生改变&
和cumsum
来确定它是否从无衰退变为衰退if_else
将所有应为“衰退”的行替换为“无衰退” library(tidyverse)
df <- read_table2(
"Date Recession
1949-06-30 1
1949-09-30 1
1949-12-31 1
1950-03-31 0
1950-06-30 0
1953-09-30 1
1953-12-31 1"
)
df %>%
mutate(
changed = Recession != lag(Recession, default = Recession[1]),
to_recession = str_c("Recession ", cumsum(changed & as.logical(Recession)) + 1),
Recession_Num = if_else(Recession == 1, to_recession, "No Recession")
) %>%
select(-changed, -to_recession)
#> # A tibble: 7 x 3
#> Date Recession Recession_Num
#> <date> <int> <chr>
#> 1 1949-06-30 1 Recession 1
#> 2 1949-09-30 1 Recession 1
#> 3 1949-12-31 1 Recession 1
#> 4 1950-03-31 0 No Recession
#> 5 1950-06-30 0 No Recession
#> 6 1953-09-30 1 Recession 2
#> 7 1953-12-31 1 Recession 2
由reprex package(v0.2.1)于2018-10-30创建
答案 2 :(得分:2)
这是cumsum
的把戏。
x <- c(1, 1, 1, 0, 0, 1, 1)
i <- cumsum(c(1, diff(x) != 0) & as.logical(x))
ifelse(x == 0, "No Recession", paste("Recession", i))
#[1] "Recession 1" "Recession 1" "Recession 1" "No Recession"
#[5] "No Recession" "Recession 2" "Recession 2"
答案 3 :(得分:1)
Date <- as.Date(c('1949-06-30', '1949-09-30', '1949-12-31', '1950-03-31', '1950-06-30', '1953-09-30', '1953-12-31'),
format = '%Y-%m-%d')
Recession <- c(1,1,1,0,0,1,1)
df <- data.frame(Date, Recession)
find_seq_1s <- function(x) {
count <- 0
in_seq <- FALSE
output <- NULL
for(i in x) {
if(i == 1 && in_seq == FALSE) {
count <- count + 1
in_seq <- TRUE
output <- c(output, paste('Recession', as.character(count)))
} else if(i == 1 && in_seq == TRUE) {
output <- c(output, paste('Recession', as.character(count)))
} else {
in_seq <- FALSE
output <- c(output, 'No Recession')
}
}
return(output)
}
df$Rec_Seq <- find_seq_1s(df$Recession)
答案 4 :(得分:0)
unlist(lapply(1:nrow(df), FUN = function(x) ifelse(df$recession[x]==1, paste("Recession", x), "No Recession")))