场合
我有一个数据框df
:
df <- structure(list(person = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L), .Label = c("pA", "pB", "pC"), class = "factor"), date = structure(c(16071,
16102, 16130, 16161, 16071, 16102, 16130, 16071, 16102), class = "Date")), .Names = c("person",
"date"), row.names = c(NA, -9L), class = "data.frame")
> df
person date
1 pA 2014-01-01
2 pA 2014-02-01
3 pA 2014-03-01
4 pA 2014-04-01
5 pB 2014-01-01
6 pB 2014-02-01
7 pB 2014-03-01
8 pC 2014-01-01
9 pC 2014-02-01
问题
如何为每个人选择按日期排序的最后2个(或'n'个)条目,以便我得到一个结果数据框df1
:
> df1
person date
1 pA 2014-03-01
2 pA 2014-04-01
3 pB 2014-02-01
4 pB 2014-03-01
5 pC 2014-01-01
6 pC 2014-02-01
我尝试了
的组合library(dplyr)
df1 <- df %>%
group_by(person) %>%
select(tail(df, 2))
没有快乐。
答案 0 :(得分:6)
您可以尝试slice
library(dplyr)
df %>%
group_by(person) %>%
arrange(date, person) %>%
slice((n()-1):n())
# person date
#1 pA 2014-03-01
#2 pA 2014-04-01
#3 pB 2014-02-01
#4 pB 2014-03-01
#5 pC 2014-01-01
#6 pC 2014-02-01
或代替最后一步
do(tail(., 2))
答案 1 :(得分:5)
使用data.table
:
setDT(df)[order(person), tail(.SD, 2L), by=person]
# person date
# 1: pA 2014-03-01
# 2: pA 2014-04-01
# 3: pB 2014-02-01
# 4: pB 2014-03-01
# 5: pC 2014-01-01
# 6: pC 2014-02-01
我们按<{1}} 订购,然后按 person
分组,并从数据子集person
中选择最后两行对于每个小组。
答案 2 :(得分:3)
由于您按人和日期订购数据(即您希望每人最多2个日期),您还可以在dplyr中使用top_n()
:
df %>% group_by(person) %>% top_n(2, date)
#Source: local data frame [6 x 2]
#Groups: person
#
# person date
#1 pA 2014-03-01
#2 pA 2014-04-01
#3 pB 2014-02-01
#4 pB 2014-03-01
#5 pC 2014-01-01
#6 pC 2014-02-01
或者,如果你已经订购了它,你可以在使用切片之前以另一种方式安排它:
df %>% arrange(person, desc(date)) %>% group_by(person) %>% slice(1:2)
#Source: local data frame [6 x 2]
#Groups: person
#
# person date
#1 pA 2014-04-01
#2 pA 2014-03-01
#3 pB 2014-03-01
#4 pB 2014-02-01
#5 pC 2014-02-01
#6 pC 2014-01-01
有关类似问题的基准,请参阅here。