我有一些有date,id和value的数据。我想添加一个名为" bad_perf"的列,其中按今天和前两天的值查看id,然后在所有2天小于10时分配1。今天的数据是NA,分配0.如果前2天有NA,则分配0.如果数据用完,则分配0。
这是数据:
asof_dt<-mdy("11/14/2014","11/21/2014","11/28/2014","12/5/2014","4/25/2014","5/2/2014","5/9/2014","5/16/2014","5/23/2014","5/30/2014","6/6/2014")
id<-c("ABC","ABC","ABC","ABC","XYZ","XYZ","XYZ","XYZ","XYZ","XYZ","XYZ")
value<-c(7,8,3,10,11,10,1,NA,9,3,10)
df<-data.frame(asof_dt,id,value)
> df
asof_dt id value
1 2014-11-14 ABC 7
2 2014-11-21 ABC 8
3 2014-11-28 ABC 3
4 2014-12-05 ABC 10
5 2014-04-25 XYZ 11
6 2014-05-02 XYZ 10
7 2014-05-09 XYZ 1
8 2014-05-16 XYZ NA
9 2014-05-23 XYZ 9
10 2014-05-30 XYZ 3
11 2014-06-06 XYZ 10
这是我想要的结果,我的评论符合预期,希望能带来更多的清晰度。
asof_dt id value bad_perf Comment
11/14/2014 ABC 7 0 Assigned 0; not enough data
11/21/2014 ABC 8 0 Assigned 0; not enough data
11/28/2014 ABC 3 1 Assigned 1; this record and the previous 2 records are less than or equal to
12/5/2014 ABC 10 1 Assigned 1; this record and the previous 2 records are less than or equal to
4/25/2014 XYZ 11 0 Assigned 0; not enough data
5/2/2014 XYZ 10 0 Assigned 0; not enough data
5/9/2014 XYZ 1 0 Assigned 0; previous 2 records are not less than or equal to 10
5/16/2014 XYZ NA 0 Assigned 0; current value is NA
5/23/2014 XYZ 9 0 Assigned 0; at least 1 NA
5/30/2014 XYZ 3 0 Assigned 0; at least 1 NA
6/6/2014 XYZ 10 1 Assigned 1; this record and the previous 2 records are less than or equal to
不幸的是,不知道如何开始。我现在在Excel中执行此步骤!
非常感谢!
答案 0 :(得分:2)
您可以尝试使用base R
方法(embed
)来创建&#34;滞后&#34;在分割&#34;值&#34;列&#34; id&#34;。然后检查每行中的所有元素是否小于10(rowSums(...)
),unlist
并获取索引。
df$bad_perf <- unlist(sapply(split(df$value, df$id), function(x) {
x1 <-embed(c(rep(NA,2), x), 2)
as.numeric(rowSums(cbind(x, x1[-nrow(x1),])<=10, na.rm=TRUE)==3)
}), use.names=FALSE)
或者你可以使用devel版本的data.table,它引入了函数shift
以获得&#34; lag&#34;列,并按照上一个解决方案执行rowSums
。
library(data.table) #data.table_1.9.5
df1 <- copy(df)
df1$bad_perf <- setDT(df)[,shift(value, n=0:2L) , id][,
(rowSums(.SD<=10, na.rm=TRUE)==3)+0L,.SDcols=2:4][]
或者使用dplyr
,可以生成滞后列。
df1 <- df %>%
group_by(id) %>%
mutate(value1=lag(value), value2=lag(value, 2L))
df$bad_perf <- (rowSums(df1[3:5]<=10, na.rm=TRUE)==3)+0
df$bad_perf
#[1] 0 0 1 1 0 0 0 0 0 0 1