我有一个像这样的数据集
dat <- data.frame(Col0 =rep(c("grp1","grp2","grp3", "grp4"), each = 4),
Col1 = rep(c("B","S","S","B"), 4),
Col2 = rep(c(1,2,3,4), 4),
Col3 = rep(c(0.1,0.2,0.3,0.4), 4))
我正在尝试创建如下所示的第四列
dat1 <- data.frame(Col0 =rep(c("grp1","grp2","grp3", "grp4"), each = 4),
Col1 = rep(c("B","S","S","B"), 4),
Col2 = rep(c(1,2,3,4), 4),
Col3 = rep(c(0.1,0.2,0.3,0.4), 4),
Col4 = c(1, 0.8, 1.26, 4, 1, 0.8, 1.26, 4, 1, 0.8, 1.26, 4))
到目前为止我一直在尝试,
d1 <- dat %>%
group_by(Col0) %>%
mutate(Col4 = if_else(Col1 == 'B', Col2,
if_else(Col1 == 'S' & lag(Col1 == "B"), lag(Col2)- Col3*lag(Col2), 0)))
d1
我得到的答案不是Col4中所希望的。 获得Col4的条件是:
if Col1 is B then get the value of Col2 as it is,
if Col1 is S & Previous Value of Col1 is B then 1-(0.2*1) which is equal to 0.8
if Col1 is S & Previous Value of Col1 is S as well then (1+0.8) -((1+0.8)*0.3) which is 1.26
基本上,这就像先执行差异,然后执行包括该差异的累加总和,等等。
就目前而言,我仅以一个简单的例子来了解我要实现的目标,实际数据集已超过100万个Obs。还有数千个组,更糟糕的是“ B”和“ S”的组合发生了变化。就像在某些小组中一样,B,B,S,S
等等……
对此我的任何帮助将不胜感激,因为我尝试了if_else()
以外的其他事情,并且看到许多条件累积总和Ques,但无济于事。
我认为使用SUMIF()函数可以在Excel中轻松完成此操作,但是我需要使用R
答案 0 :(得分:0)
感觉您没有完成if_else
:
dat <- data.frame(Col0 =rep(c("grp1","grp2","grp3", "grp4"), each = 4),
Col1 = rep(c("B","S","S","B"), 4),
Col2 = rep(c(1,2,3,4), 4),
Col3 = rep(c(0.1,0.2,0.3,0.4), 4))
d1 <- dat %>%
group_by(Col0) %>%
mutate(Col4 = if_else(Col1 == 'B', Col2,
if_else(Col1 == 'S' & lag(Col1) == "B", 1-(0.2*1),
if_else(Col1 == 'S' & lag(Col1) == 'S',1.26,0))))
d1