我想使用library(dplyr)
计算以前发生的值的出现次数。
示例数据:
dates <- as.Date(as.character(c("2011-01-13",
"2011-01-14",
"2011-01-15",
"2011-01-16",
"2011-01-17",
"2011-01-13",
"2011-01-14",
"2011-01-15",
"2011-01-16",
"2011-01-17",
"2011-01-13",
"2011-01-14",
"2011-01-15",
"2011-01-16",
"2011-01-17",
"2011-01-17",
"2011-01-17",
"2011-01-18",
"2011-01-18")))
ID <-c("1","2","3","3","1","5","6","5","7","8","1","2","11","2",'12',"5","5","1","4")
# put together
data <- data.frame(dates,ID)
data
dates ID
1 2011-01-13 1
2 2011-01-14 2
3 2011-01-15 3
4 2011-01-16 3
5 2011-01-17 1
6 2011-01-13 5
7 2011-01-14 6
8 2011-01-15 5
9 2011-01-16 7
10 2011-01-17 8
11 2011-01-13 1
12 2011-01-14 2
13 2011-01-15 11
14 2011-01-16 2
15 2011-01-17 12
16 2011-01-17 5
17 2011-01-17 5
18 2011-01-18 1
19 2011-01-18 4
我想构建一个类似于:
的数据集 dates ID prev_occurene
1 2011-01-13 1 1
2 2011-01-14 2 1
3 2011-01-15 3 1
4 2011-01-16 3 2
5 2011-01-17 1 2
6 2011-01-13 5 1
7 2011-01-14 6 1
8 2011-01-15 5 2
9 2011-01-16 7 1
10 2011-01-17 8 1
11 2011-01-13 1 3
12 2011-01-14 2 2
13 2011-01-15 11 1
14 2011-01-16 2 3
15 2011-01-17 12 1
16 2011-01-17 5 3
17 2011-01-17 5 4
18 2011-01-18 1 4
19 2011-01-18 4 1
如果过去曾发生过,我会在ID中添加1。
到目前为止,我已尝试使用重复项来解决这个问题。然而,输出看起来不太有希望:
library(dplyr)
data_dups <- data %>%
group_by(dates) %>%
mutate(dups = duplicated(ID)) %>%
filter(dups == 'TRUE') %>%
summarise(occurence = n())
dates occurence
<date> <int>
1 2011-01-13 1
2 2011-01-14 1
3 2011-01-17 1
答案 0 :(得分:2)
在dplyr
中,您可以使用row_number()
来计算群组中的出现次数。
library(tidyverse)
data %>%
arrange(dates) %>%
group_by(ID) %>%
mutate(occurrence = row_number())
# A tibble: 19 x 3
# Groups: ID [10]
# dates ID occurrence
# <date> <fctr> <int>
# 1 2011-01-13 1 1
# 2 2011-01-14 2 1
# 3 2011-01-15 3 1
# 4 2011-01-16 3 2
# 5 2011-01-17 1 2
# 6 2011-01-13 5 1
# 7 2011-01-14 6 1
# 8 2011-01-15 5 2
# 9 2011-01-16 7 1
# 10 2011-01-17 8 1
# 11 2011-01-13 1 3
# 12 2011-01-14 2 2
# 13 2011-01-15 11 1
# 14 2011-01-16 2 3
# 15 2011-01-17 12 1
# 16 2011-01-17 5 3
# 17 2011-01-17 5 4
# 18 2011-01-18 1 4
# 19 2011-01-18 4 1
请注意,此解决方案依赖于dates
排序的数据。因此,添加了arrange(dates)
。
答案 1 :(得分:0)
使用dplyr::row_number()
data %>% group_by(dates) %>% mutate(occurrence = row_number())