通过匹配不同列的值来创建组

时间:2018-06-07 20:05:30

标签: r group-by

我想通过匹配值从基础创建组。

我有以下数据表:

now<-c(1,2,3,4,24,25,26,5,6,21,22,23)
before<-c(0,1,2,3,23,24,25,4,5,0,21,22)
after<-c(2,3,4,5,25,26,0,6,0,22,23,24)
df<-as.data.frame(cbind(now,before,after))

再现以下数据:

   now before after
1    1      0     2
2    2      1     3
3    3      2     4
4    4      3     5
5   24     23    25
6   25     24    26
7   26     25     0
8    5      4     6
9    6      5     0
10   21      0    22
11   22     21    23
12   23     22    24

我想得到:

    now before after group
1    1      0     2     A
2    2      1     3     A
3    3      2     4     A
4    4      3     5     A
5    5      4     6     A
6    6      5     0     A
7   21      0    22     B
8   22     21    23     B
9   23     22    24     B
10  24     23    25     B
11  25     24    26     B
12  26     25     0     B

我想在不使用“for”循环的情况下得到答案,因为实际数据太大了。

任何你能提供的将不胜感激。

1 个答案:

答案 0 :(得分:0)

这是一种方法。很难避免for循环,因为这是一个非常棘手的算法。对他们的反对通常是基于优雅而不是速度,但有时他们是完全合适的。

df$group <- seq_len(nrow(df)) #assign each row to its own group

stop <- FALSE #indicates convergence

while(!stop){
  pre <- df$group #group column at start of loop

  for(i in seq_len(nrow(df))){
    matched <- which(df$before==df$now[i] | df$after==df$now[i]) #check matches in before and after columns
    group <- min(df$group[i], df$group[matched]) #identify smallest group no of matching rows
    df$group[i] <- group #set to smallest group
    df$group[matched] <- group #set to smallest group
  }

  if(identical(df$group, pre)) stop <- TRUE #stop if no change
}

df$group <- LETTERS[match(df$group, sort(unique(df$group)))] #convert groups to letters
#(just use match(...) to keep them as integers - e.g. if you have more than 26 groups)

df <- df[order(df$group, df$now),] #reorder as required

df
   now before after group
1    1      0     2     A
2    2      1     3     A
3    3      2     4     A
4    4      3     5     A
8    5      4     6     A
9    6      5     0     A
10  21      0    22     B
11  22     21    23     B
12  23     22    24     B
5   24     23    25     B
6   25     24    26     B
7   26     25     0     B