Question

我想在数据框中为每个组提供数字。例如，我有以下数据框：

df = data.frame( from = c('a', 'a', 'b'), dest = c('b', 'c', 'd') )
#> df
    #from dest
#1    a    b
#2    a    c
#3    b    d

我希望按from值进行分组，并为每个组分配一个组号。这是预期的结果：

result = data.frame( from = c('a', 'a', 'b'), dest = c('b', 'c', 'd'), group_no = c(1,1,2) )
#> result
    #from dest group_no
#1    a    b        1
#2    a    c        1
#3    b    d        2

我可以使用for循环解决这个问题，如下所示：

groups = df$from %>% unique
i = 0
df$group_no = NA
for ( g in groups ) {
    i = i + 1
    df[ df$from == g, ]$group_no = i
}
#> df
    #from dest group_no
#1    a    b        1
#2    a    c        1
#3    b    d        2

我想知道是否有可能在不使用for循环的情况下以更优雅和更实用的方式解决这个问题？具体来说，我想知道是否可以使用dplyr::group_by函数完成此操作？

Answer 1

使用mutate添加一个仅为from的数字形式的列作为因素：

df %>% mutate(group_no = as.integer(factor(from)))

#   from dest group_no
# 1    a    b        1
# 2    a    c        1
# 3    b    d        2

......或者只是

mutate(df, group_no = as.integer(factor(from)))

此处不需要group_by，除非您将其用于其他目的。如果您希望按新列分组以供日后使用，则可以使用group_by代替mutate来添加列。

Answer 2

我们可以使用group_indices

中的dplyr

library(dplyr)
df %>% 
   mutate(group_no = group_indices_(., .dots="from"))
#     from dest group_no
#1    a    b        1
#2    a    c        1
#3    b    d        2

使用data.table的类似选项是

library(data.table)
setDT(df)[, group_no := .GRP, by = from]

Answer 3

您可以尝试使用基本软件包中的transform

transform(df,group_no=as.numeric(factor(from)))

#   from dest group_no
#1    a    b  1
#2    a    c  1
#3    b    d  2

如果from列已经是一个因素，您可以删除factor()功能并仅使用

transform(df,id=as.numeric(from))

如何使用dplyr :: group_by为数据帧的每个组赋予数字？

3 个答案: