根据多个列创建group_indices

时间:2017-07-13 11:40:38

标签: r dplyr

我想基于两列生成索引以对观察进行分组。但我希望小组能够分享观察,至少在公地观察一次。我可以看到如何根据观察结果制作小组,这些观察结果可以共享两种观察结果,而不仅仅是其中一种观察结果。

例如,使用数据框:

dt <- data.frame(id=1:10,
             G1 = c("A","A","B","B","C","C","C","D","E","F"),
             G2 = c("Z","X","X","Y","W","V","U","s","T","T"))

我想要一个专栏

1,1,1,1,2,2,2,3,4,4

我尝试使用dplyr中的group_indices,但没有管理它。

1 个答案:

答案 0 :(得分:14)

使用 igraph 获取成员资格,然后映射名称:

library(igraph)

# convert to graph, and get clusters membership ids
g <- graph_from_data_frame(df1[, c(2, 3, 1)])
myGroups <- components(g)$membership

myGroups 
# A B C D E F Z X Y W V U s T 
# 1 1 2 3 4 4 1 1 1 2 2 2 3 4 

# then map on names
df1$group <- myGroups[df1$G1]


df1
#    id G1 G2 group
# 1   1  A  Z     1
# 2   2  A  X     1
# 3   3  B  X     1
# 4   4  B  Y     1
# 5   5  C  W     2
# 6   6  C  V     2
# 7   7  C  U     2
# 8   8  D  s     3
# 9   9  E  T     4
# 10 10  F  T     4