r如何检查上一期间是否存在价值(滚动)

时间:2017-01-13 03:47:51

标签: r

这是我的数据集:

structure(list(Date = structure(c(14609, 14609, 14609, 14609, 14699, 14699, 14699, 14699, 14790, 14790, 14790, 14790), class = "Date"), 
ID = structure(c(5L, 4L, 6L, 10L, 9L, 3L, 10L, 8L, 7L, 1L, 
10L, 2L), .Label = c("B00NYQ2", "B03J9L7", "B05DZD1", "B06HC42", 
"B09V3X7", "B09YCC8", "X6114659", "X6478816", "X6556701", 
"X6812555"), class = "factor"), Name = structure(c(10L, 4L, 
9L, 8L, 7L, 3L, 8L, 6L, 2L, 5L, 8L, 1L), .Label = c("AIRA", 
"BOUS", "CSCS", "EVF", "GTB", "JER", "MGB", "MPR", "NVB", 
"TTNP"), class = "factor"), Score = c(55.075, 54.5, 53.325, 
52.175, 70.275, 69.825, 60.15, 60.025, 56.175, 52.65, 52.175, 
52.125), Score.rank = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 
2L, 3L, 4L)), .Names = c("Date", "ID", "Name", "Score", "Score.rank"), row.names = c(1L, 2L, 3L, 4L, 71L, 72L, 73L, 74L, 156L, 157L, 158L, 159L), class = "data.frame")

当我们进入一个新时期时,我正试图找出哪些ID进出。

我的意思是......我想比较前一时期是否存在ID,用“日期”表示。

如果它在前一个时期(日期)存在,则不应返回任何内容。

如果在上一期间不存在,则应返回“IN”。

我还想表明,如果 next 期间不存在,它应该返回“OUT”。 即这个时期的OUT应该等于下一个时期INs

我的预期数据框应该看起来像这样

Date        ID         Name     Score   Score.rank  THIS PERIOD NEXT PERIOD
31/12/2009  B09V3X7    TTNP     55.075  1                        OUT
31/12/2009  B06HC42    EVF      54.5    2                        OUT
31/12/2009  B09YCC8    NVB      53.325  3                        OUT
31/12/2009  X6812555    MPR     52.175  4       
31/3/2010   X6556701    MGB     70.275  1              IN   
31/3/2010   B05DZD1     CSCS    69.825  2              IN        OUT
31/3/2010   X6812555    MPR     60.15   3       
31/3/2010   X6478816    JER     60.025  4              IN        OUT
30/6/2010   X6114659    BOUS    56.175  1              IN   
30/6/2010   B00NYQ2     GTB     52.65   2              IN   
30/6/2010   X6812555    MPR     52.175  3       
30/6/2010   B03J9L7     AIRA    52.125  4              IN   

有人可以指出我如何做到这一点的正确方向? 提前致谢

1 个答案:

答案 0 :(得分:1)

不幸的是,您的描述和示例不匹配。 考虑到您的描述,您似乎想要标记ID的进入和退出条件。

可以实现:

dft %>%
  group_by(ID) %>%
  dplyr::mutate( This_period = if_else(Date == min(Date), "IN", NULL) ) %>%
  dplyr::mutate( Next_period = if_else(Date == max(Date), "OUT", NULL))

并返回:

#Source: local data frame [12 x 7]
#Groups: ID [10]
#
#         Date       ID   Name  Score Score.rank This_period Next_period
#       <date>   <fctr> <fctr>  <dbl>      <int>       <chr>       <chr>
#1  2009-12-31  B09V3X7   TTNP 55.075          1          IN         OUT
#2  2009-12-31  B06HC42    EVF 54.500          2          IN         OUT
#3  2009-12-31  B09YCC8    NVB 53.325          3          IN         OUT
#4  2009-12-31 X6812555    MPR 52.175          4          IN        <NA>
#5  2010-03-31 X6556701    MGB 70.275          1          IN         OUT
#6  2010-03-31  B05DZD1   CSCS 69.825          2          IN         OUT
#7  2010-03-31 X6812555    MPR 60.150          3        <NA>        <NA>
#8  2010-03-31 X6478816    JER 60.025          4          IN         OUT
#9  2010-06-30 X6114659   BOUS 56.175          1          IN         OUT
#10 2010-06-30  B00NYQ2    GTB 52.650          2          IN         OUT
#11 2010-06-30 X6812555    MPR 52.175          3        <NA>         OUT
#12 2010-06-30  B03J9L7   AIRA 52.125          4          IN         OUT

但是,您的示例建议您要从min(Date)检查中排除this_period,从max(Date)检查中排除Next_period。是这样吗?如果是,score.rankDate有什么关联? 请澄清。