使用NA处理回顾

时间:2015-01-23 03:19:12

标签: r time-series

我有一些有date,id和value的数据。我想添加一个名为" bad_perf"的列,其中按今天和前两天的值查看id,然后在所有2天小于10时分配1。今天的数据是NA,分配0.如果前2天有NA,则分配0.如果数据用完,则分配0。

这是数据:

asof_dt<-mdy("11/14/2014","11/21/2014","11/28/2014","12/5/2014","4/25/2014","5/2/2014","5/9/2014","5/16/2014","5/23/2014","5/30/2014","6/6/2014")
  id<-c("ABC","ABC","ABC","ABC","XYZ","XYZ","XYZ","XYZ","XYZ","XYZ","XYZ")
  value<-c(7,8,3,10,11,10,1,NA,9,3,10)
  df<-data.frame(asof_dt,id,value)   


> df
     asof_dt  id value
1  2014-11-14 ABC     7
2  2014-11-21 ABC     8
3  2014-11-28 ABC     3
4  2014-12-05 ABC    10
5  2014-04-25 XYZ    11
6  2014-05-02 XYZ    10
7  2014-05-09 XYZ     1
8  2014-05-16 XYZ    NA
9  2014-05-23 XYZ     9
10 2014-05-30 XYZ     3
11 2014-06-06 XYZ    10

这是我想要的结果,我的评论符合预期,希望能带来更多的清晰度。

        asof_dt  id value   bad_perf    Comment
  11/14/2014    ABC 7   0   Assigned 0; not enough data
  11/21/2014    ABC 8   0   Assigned 0; not enough data
  11/28/2014    ABC 3   1   Assigned 1; this record and the previous 2 records are less than or equal to 
  12/5/2014     ABC 10  1   Assigned 1; this record and the previous 2 records are less than or equal to 
  4/25/2014     XYZ 11  0   Assigned 0; not enough data
  5/2/2014      XYZ 10  0   Assigned 0; not enough data
  5/9/2014      XYZ 1   0   Assigned 0; previous 2 records are not less than or equal to 10
  5/16/2014     XYZ NA  0   Assigned 0; current value is NA
  5/23/2014     XYZ 9   0   Assigned 0; at least 1 NA
  5/30/2014     XYZ 3   0   Assigned 0; at least 1 NA
  6/6/2014      XYZ 10  1   Assigned 1; this record and the previous 2 records are less than or equal to 

不幸的是,不知道如何开始。我现在在Excel中执行此步骤!

非常感谢!

1 个答案:

答案 0 :(得分:2)

您可以尝试使用base R方法(embed)来创建&#34;滞后&#34;在分割&#34;值&#34;列&#34; id&#34;。然后检查每行中的所有元素是否小于10(rowSums(...)),unlist并获取索引。

df$bad_perf <- unlist(sapply(split(df$value, df$id), function(x) {
               x1 <-embed(c(rep(NA,2), x), 2)
          as.numeric(rowSums(cbind(x, x1[-nrow(x1),])<=10, na.rm=TRUE)==3)
           }), use.names=FALSE)

或者你可以使用devel版本的data.table,它引入了函数shift以获得&#34; lag&#34;列,并按照上一个解决方案执行rowSums

library(data.table) #data.table_1.9.5
df1 <- copy(df) 
df1$bad_perf <- setDT(df)[,shift(value, n=0:2L) , id][,
                 (rowSums(.SD<=10, na.rm=TRUE)==3)+0L,.SDcols=2:4][]

或者使用dplyr,可以生成滞后列。

df1 <- df %>% 
          group_by(id) %>% 
          mutate(value1=lag(value), value2=lag(value, 2L))

df$bad_perf <- (rowSums(df1[3:5]<=10, na.rm=TRUE)==3)+0
df$bad_perf
#[1] 0 0 1 1 0 0 0 0 0 0 1