在我的数据中,有几行具有相同的id
但具有不同的id2
和日期。我想使用最新的并且具有最大的id2
的行。
例如,在下面的数据集中,第13至15行是最新的,并且id2
在具有相同日期的行中最大。
可能有多个人可以满足此条件,这就是为什么我要使用13到15的行,而不是其中之一。
id <- c("id7590", "id7590", "id7590", "id7590", "id7590", "id7590", "id7590", "id7590", "id7590", "id7590",
"id7590", "id7590", "id7590", "id7590", "id7590", "id7590", "id7590", "id7590", "id7590", "id7590",
"id7590")
id2 <- c("n0960999", "n0960999", "n0960999", "n0961001", "n0961001", "n0961001", "n0961002", "n0961002",
"n0961002", "n0961003", "n0961003", "n0961003", "n0961004", "n0961004", "n0961004", "n0961183",
"n0961183", "n0961183", "n0961184", "n0961184", "n0961184")
date <- c("1980-06-24", "1980-06-24", "1980-06-24", "1980-06-25", "1980-06-25", "1980-06-25", "1980-06-25",
"1980-06-25", "1980-06-25", "1980-06-25", "1980-06-25", "1980-06-25", "1980-06-25", "1980-06-25",
"1980-06-25", "1980-09-24", "1980-09-24", "1980-09-24", "1980-09-24", "1980-09-24", "1980-09-24")
people <- c("14029", "3078", "7333", "14029", "7333", "3078", "7333", "14029", "3078", "7333", "14029", "3078",
"3078", "14029", "7333", "7333", "14029", "3078", "14029", "3078", "7333")
tibble(id=id, id2=id2, people=people, date=date)
id id2 people date
1 id7590 n0960999 14029 1980-06-24
2 id7590 n0960999 3078 1980-06-24
3 id7590 n0960999 7333 1980-06-24
4 id7590 n0961001 14029 1980-06-25
5 id7590 n0961001 7333 1980-06-25
6 id7590 n0961001 3078 1980-06-25
7 id7590 n0961002 7333 1980-06-25
8 id7590 n0961002 14029 1980-06-25
9 id7590 n0961002 3078 1980-06-25
10 id7590 n0961003 7333 1980-06-25
11 id7590 n0961003 14029 1980-06-25
12 id7590 n0961003 3078 1980-06-25
13 id7590 n0961004 3078 1980-06-25
14 id7590 n0961004 14029 1980-06-25
15 id7590 n0961004 7333 1980-06-25
16 id7590 n0961183 7333 1980-09-24
17 id7590 n0961183 14029 1980-09-24
18 id7590 n0961183 3078 1980-09-24
19 id7590 n0961184 14029 1980-09-24
20 id7590 n0961184 3078 1980-09-24
21 id7590 n0961184 7333 1980-09-24
我发现a similar question用于SQL,但是我想知道如何使用dplyr
。
答案 0 :(得分:0)
您可以使用group_by
和top_n
来解决此问题:
mydf <- tibble(id = id, id2 = id2, people = people, date = date)
mydf %>%
group_by(id, date) %>%
top_n(1, id2)
# A tibble: 9 x 4
# Groups: id, date [3]
# id id2 people date
# <chr> <chr> <chr> <chr>
# 1 id7590 n0960999 14029 1980-06-24
# 2 id7590 n0960999 3078 1980-06-24
# 3 id7590 n0960999 7333 1980-06-24
# 4 id7590 n0961004 3078 1980-06-25
# 5 id7590 n0961004 14029 1980-06-25
# 6 id7590 n0961004 7333 1980-06-25
# 7 id7590 n0961184 14029 1980-09-24
# 8 id7590 n0961184 3078 1980-09-24
# 9 id7590 n0961184 7333 1980-09-24