对于A,它看起来像这样。
Name Date Value NewColumn other columns
A 2000-01 0.5
A 2001-03 0.4 0
A 2002-02 1.0 1
A 2003-05 0.9 0
A 2004-06 0.9
A 2006-03 0.4 <- no previous year
答案 0 :(得分:1)
df = read.table(text = "
Name Date Value
A 2000-01 0.5
A 2001-03 0.4
A 2002-02 1.0
A 2003-05 0.9
A 2004-06 0.9
A 2006-03 0.4
", header=T, stringsAsFactors=F)
library(dplyr)
df %>%
group_by(Name) %>% # for each name
mutate(change = Value/lag(Value)-1, # get the change in value (increase or decrease)
year = as.numeric(substr(Date, 1, 4)), # get the year from the date
NewColumn = case_when(change > 0.01 & lag(year) == year-1 ~ 1, # if change is more than 1% and the previous row is 1 year before flag as 1
change < -0.01 & lag(year) == year-1 ~ 0)) %>% # if change is less than 1% and the previous row is 1 year before flag as 0
ungroup()
# # A tibble: 6 x 6
# Name Date Value change year NewColumn
# <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 A 2000-01 0.5 NA 2000 NA
# 2 A 2001-03 0.4 -0.200 2001 0
# 3 A 2002-02 1 1.5 2002 1
# 4 A 2003-05 0.9 -0.100 2003 0
# 5 A 2004-06 0.9 0 2004 NA
# 6 A 2006-03 0.4 -0.556 2006 NA
您可以删除一些不必要的变量。我离开他们只是为了帮助你了解这个过程是如何运作的。
答案 1 :(得分:1)
由于问题已用data.table
标记,因此这是一个相应的解决方案,它使用NA
和逻辑值的一些棘手的算法:
library(data.table)
setDT(DT)[order(Date), NewColumn := {
yr <- year(lubridate::ymd(Date, truncated = 1L))
chg <- Value / shift(Value) - 1.0
NA^(yr - shift(yr) != 1L) * NA^(!abs(chg) > 0.01) * (sign(chg) / 2.0 + 0.5)
}, by = Name][]
Name Date Value NewColumn 1: A 2000-01 0.5 NA 2: A 2001-03 0.4 0 3: A 2002-02 1.0 1 4: A 2003-05 0.9 0 5: A 2004-06 0.9 NA 6: A 2006-03 0.4 NA
这里的诀窍是使用NA^0
为1且NA^1
为NA
,FALSE
对应0和TRUE
对1的事实,所以
NA^c(FALSE, TRUE)
返回
[1] 1 NA
library(data.table)
DT <- fread("Name Date Value
A 2000-01 0.5
A 2001-03 0.4
A 2002-02 1.0
A 2003-05 0.9
A 2004-06 0.9
A 2006-03 0.4 ")