Question

我在R中有一个大型数据集，它由来自个别案例的多个记录组织，嵌套在组中。这里有一个玩具示例：

d = data.frame(group = rep(c('control','patient'), each = 5), case = c('a', 'a', 'b', 'c', 'c', 'd','d','d','e','e'))

如果在dplyr链中应用了group_by(group, case)，那么如何创建一个列，按照其在组中的大小顺序对每一行进行编号？例如在下面的示例中，在第三列中，案例'a'是控制组中的第一个案例，案例'c'是第三个案例，但案例'd'的编号重置为1，患者组中的第一个案例

  group case  number
control  a    1
control  a    1
control  b    2
control  c    3
control  c    3
patient  d    1
patient  d    1
patient  d    1
patient  e    2
patient  e    2

我可以通过使用'for'循环计算案例来看看如何做到这一点，但我想知道是否有办法在标准的dplyr风格的操作链中实现这一点？

Answer 1

group_by(d, group) %>% 
   mutate(number= droplevels(case) %>% as.numeric)

Answer 2

我们可以使用data.table

library(data.table)
setDT(d)[, numbers := as.numeric(factor(case, levels = unique(case))), group]

Answer 3

一种解决方案是：

library(dplyr)
library(tibble)

want<-left_join(d,
                d %>%
                  distinct(case) %>%
                  rownames_to_column(var="number") ,
                by="case")

# .. added later:
want2<-left_join(d,
                 bind_rows(
                   d %>%
                     filter(group=="control") %>%
                     distinct(case) %>%
                     rownames_to_column(var="number"),
                   d %>%
                     filter(group=="patient") %>%
                     distinct(case) %>%
                     rownames_to_column(var="number")),
                   by="case")

# I think this is less readable:
want3<-left_join(d,
                 bind_rows(by(d,d$group,function(x) x %>%
                                distinct(case) %>%
                                rownames_to_column(var="number"))),
                 by="case")

在dplyr中的group_by层次结构内计数级别

3 个答案: