如何计算按日期分组的以前出现的ID?

时间:2017-08-21 14:29:34

标签: r dplyr

我想使用library(dplyr)计算以前发生的值的出现次数。

示例数据:

dates <- as.Date(as.character(c("2011-01-13",
                                    "2011-01-14",
                                    "2011-01-15",
                                    "2011-01-16",
                                    "2011-01-17",
                                    "2011-01-13",
                                    "2011-01-14",
                                    "2011-01-15",
                                    "2011-01-16",
                                    "2011-01-17",
                                    "2011-01-13",
                                    "2011-01-14",
                                    "2011-01-15",
                                    "2011-01-16",
                                    "2011-01-17",
                                    "2011-01-17",
                                    "2011-01-17",
                                    "2011-01-18",
                                    "2011-01-18")))

    ID <-c("1","2","3","3","1","5","6","5","7","8","1","2","11","2",'12',"5","5","1","4")
    # put together
    data <- data.frame(dates,ID)
    data

        dates     ID
    1  2011-01-13  1
    2  2011-01-14  2
    3  2011-01-15  3
    4  2011-01-16  3
    5  2011-01-17  1
    6  2011-01-13  5
    7  2011-01-14  6
    8  2011-01-15  5
    9  2011-01-16  7
    10 2011-01-17  8
    11 2011-01-13  1
    12 2011-01-14  2
    13 2011-01-15 11
    14 2011-01-16  2
    15 2011-01-17 12
    16 2011-01-17  5
    17 2011-01-17  5
    18 2011-01-18  1
    19 2011-01-18  4

我想构建一个类似于:

的数据集
          dates    ID       prev_occurene
    1  2011-01-13  1             1
    2  2011-01-14  2             1
    3  2011-01-15  3             1
    4  2011-01-16  3             2
    5  2011-01-17  1             2
    6  2011-01-13  5             1
    7  2011-01-14  6             1
    8  2011-01-15  5             2
    9  2011-01-16  7             1
    10 2011-01-17  8             1
    11 2011-01-13  1             3
    12 2011-01-14  2             2
    13 2011-01-15 11             1
    14 2011-01-16  2             3
    15 2011-01-17 12             1
    16 2011-01-17  5             3
    17 2011-01-17  5             4
    18 2011-01-18  1             4
    19 2011-01-18  4             1

如果过去曾发生过,我会在ID中添加1。

到目前为止,我已尝试使用重复项来解决这个问题。然而,输出看起来不太有希望:

library(dplyr)

data_dups <- data %>% 
  group_by(dates) %>% 
  mutate(dups = duplicated(ID)) %>%
  filter(dups == 'TRUE') %>% 
  summarise(occurence = n())

            dates occurence

        <date>           <int>
      1 2011-01-13         1
      2 2011-01-14         1
      3 2011-01-17         1

2 个答案:

答案 0 :(得分:2)

dplyr中,您可以使用row_number()来计算群组中的出现次数。

library(tidyverse)
data %>% 
  arrange(dates) %>% 
  group_by(ID) %>% 
  mutate(occurrence = row_number())

# A tibble: 19 x 3
# Groups:   ID [10]
#          dates     ID occurrence
#         <date> <fctr>      <int>
#  1 2011-01-13      1          1
#  2 2011-01-14      2          1
#  3 2011-01-15      3          1
#  4 2011-01-16      3          2
#  5 2011-01-17      1          2
#  6 2011-01-13      5          1
#  7 2011-01-14      6          1
#  8 2011-01-15      5          2
#  9 2011-01-16      7          1
# 10 2011-01-17      8          1
# 11 2011-01-13      1          3
# 12 2011-01-14      2          2
# 13 2011-01-15     11          1
# 14 2011-01-16      2          3
# 15 2011-01-17     12          1
# 16 2011-01-17      5          3
# 17 2011-01-17      5          4
# 18 2011-01-18      1          4
# 19 2011-01-18      4          1

请注意,此解决方案依赖于dates排序的数据。因此,添加了arrange(dates)

答案 1 :(得分:0)

使用dplyr::row_number()

尝试此操作
data %>% group_by(dates) %>% mutate(occurrence = row_number())