dplyr:如果多个行在列中具有相同的值,请按条件选择行

时间:2018-11-07 12:12:28

标签: r dplyr

在我的数据中,有几行具有相同的id但具有不同的id2和日期。我想使用最新的并且具有最大的id2的行。

例如,在下面的数据集中,第13至15行是最新的,并且id2在具有相同日期的行中最大。

可能有多个人可以满足此条件,这就是为什么我要使用13到15的行,而不是其中之一。

id <- c("id7590", "id7590", "id7590", "id7590", "id7590", "id7590", "id7590", "id7590", "id7590", "id7590",
"id7590", "id7590", "id7590", "id7590", "id7590", "id7590", "id7590", "id7590", "id7590", "id7590",
"id7590")
id2 <- c("n0960999", "n0960999", "n0960999", "n0961001", "n0961001", "n0961001", "n0961002", "n0961002",
"n0961002", "n0961003", "n0961003", "n0961003", "n0961004", "n0961004", "n0961004", "n0961183",
"n0961183", "n0961183", "n0961184", "n0961184", "n0961184")
date <- c("1980-06-24", "1980-06-24", "1980-06-24", "1980-06-25", "1980-06-25", "1980-06-25", "1980-06-25",
"1980-06-25", "1980-06-25", "1980-06-25", "1980-06-25", "1980-06-25", "1980-06-25", "1980-06-25",
"1980-06-25", "1980-09-24", "1980-09-24", "1980-09-24", "1980-09-24", "1980-09-24", "1980-09-24")
people <- c("14029", "3078", "7333", "14029", "7333", "3078", "7333", "14029", "3078", "7333", "14029", "3078",
"3078", "14029", "7333", "7333", "14029", "3078", "14029", "3078", "7333")
tibble(id=id, id2=id2, people=people, date=date)
       id      id2 people       date
1  id7590 n0960999  14029 1980-06-24
2  id7590 n0960999   3078 1980-06-24
3  id7590 n0960999   7333 1980-06-24
4  id7590 n0961001  14029 1980-06-25
5  id7590 n0961001   7333 1980-06-25
6  id7590 n0961001   3078 1980-06-25
7  id7590 n0961002   7333 1980-06-25
8  id7590 n0961002  14029 1980-06-25
9  id7590 n0961002   3078 1980-06-25
10 id7590 n0961003   7333 1980-06-25
11 id7590 n0961003  14029 1980-06-25
12 id7590 n0961003   3078 1980-06-25
13 id7590 n0961004   3078 1980-06-25
14 id7590 n0961004  14029 1980-06-25
15 id7590 n0961004   7333 1980-06-25
16 id7590 n0961183   7333 1980-09-24
17 id7590 n0961183  14029 1980-09-24
18 id7590 n0961183   3078 1980-09-24
19 id7590 n0961184  14029 1980-09-24
20 id7590 n0961184   3078 1980-09-24
21 id7590 n0961184   7333 1980-09-24

我发现a similar question用于SQL,但是我想知道如何使用dplyr

1 个答案:

答案 0 :(得分:0)

您可以使用group_bytop_n来解决此问题:

mydf <- tibble(id = id, id2 = id2, people = people, date = date) 

mydf %>% 
  group_by(id, date) %>% 
  top_n(1, id2)

# A tibble: 9 x 4
# Groups:   id, date [3]
#   id     id2      people date      
#   <chr>  <chr>    <chr>  <chr>     
# 1 id7590 n0960999 14029  1980-06-24
# 2 id7590 n0960999 3078   1980-06-24
# 3 id7590 n0960999 7333   1980-06-24
# 4 id7590 n0961004 3078   1980-06-25
# 5 id7590 n0961004 14029  1980-06-25
# 6 id7590 n0961004 7333   1980-06-25
# 7 id7590 n0961184 14029  1980-09-24
# 8 id7590 n0961184 3078   1980-09-24
# 9 id7590 n0961184 7333   1980-09-24