使用data.table在组中选择x最高值

时间:2018-03-05 19:25:12

标签: r data.table

如何为data.table中的每个组选择x最高值?

例如,我想为每个组(日期)取两个最高值(Val)。所以对于这个数据集:

Date    Name    Val
01/01/2010  A   3
01/01/2010  B   2
01/01/2010  C   1
02/01/2010  A   4
02/01/2010  B   2
02/01/2010  C   3
02/01/2010  D   1

代码应返回:

Date    Name    Val
01/01/2010  A   3
01/01/2010  B   2
02/01/2010  A   4
02/01/2010  C   3

1 个答案:

答案 0 :(得分:1)

df <- read.table(text = "Date    Name    Val
01/01/2010  A   3
                 01/01/2010  B   2
                 01/01/2010  C   1
                 02/01/2010  A   4
                 02/01/2010  B   2
                 02/01/2010  C   3
                 02/01/2010  D   1", 
                 header = TRUE, stringsAsFactors = FALSE)

setDT(df)
df[, max_val := max(Val), by = Date]
df[, max_sec := order(Val, decreasing = T)[2], by = Date]
df <- df[Val == max_val | Val == max_sec, ]
df[, c("max_val", "max_sec") := NULL]

         Date Name Val
1: 01/01/2010    A   3
2: 01/01/2010    B   2
3: 02/01/2010    A   4
4: 02/01/2010    C   3