根据递归行对数据框进行子集并创建用于排序的列

时间:2018-08-08 05:18:59

标签: r dataframe

考虑样本数据

df <-
  structure(
    list(
      id = c(1L, 1L, 1L, 1L, 2L, 2L, 3L),
      A = c(20L, 12L, 13L, 8L, 11L, 21L, 17L),
      B = c(1L, 1L, 0L, 0L, 1L, 0L, 0L)
    ),
    .Names = c("id", "A", "B"),
    class = "data.frame",
    row.names = c(NA,-7L)
  )

每个ID(存储在第1列中)在AB列中具有不同数量的条目。在示例数据中,使用id = 1有四个观察值。我正在寻找一种在R中将此数据子集化的方法,以便每个id最多有3个条目,并最终创建由每个id的顺序组成的另一列(标记为C)。预期的输出如下所示:

df <-
  structure(
    list(
      id = c(1L, 1L, 1L, 2L, 2L, 3L),
      A = c(20L, 12L, 13L, 11L, 21L, 17L),
      B = c(1L, 1L, 0L, 1L, 0L, 0L),
      C = c(1L, 2L, 3L, 1L, 2L, 1L)
    ),
    .Names = c("id", "A", "B","C"),
    class = "data.frame",
    row.names = c(NA,-6L)
  )

非常感谢您的帮助。

2 个答案:

答案 0 :(得分:1)

喜欢吗?

library(data.table)
dt <- as.data.table(df)
dt[, C := seq(.N), by = id]
dt <- dt[C <= 3,]
dt
#    id  A B C
# 1:  1 20 1 1
# 2:  1 12 1 2
# 3:  1 13 0 3
# 4:  2 11 1 1
# 5:  2 21 0 2
# 6:  3 17 0 1

答案 1 :(得分:1)

这是dplyr的一个选项,并考虑基于A的前3个值(基于@Ronak Shah的评论)。

library(dplyr)
df %>%
        group_by(id) %>%
        top_n(n = 3, wt = A) %>% # top 3 values based on A
        mutate(C = rank(id, ties.method = "first")) # C consists of the order of each id
# A tibble: 6 x 4
# Groups:   id [3]
     id     A     B     C
  <int> <int> <int> <int>
1     1    20     1     1
2     1    12     1     2
3     1    13     0     3
4     2    11     1     1
5     2    21     0     2
6     3    17     0     1