Question

我创建了一个数据框

data <- data.frame(a=c(1,1,2,2,3,3,4,5), b=c(1,2,2,3,3,4,5,6))

    a   b
    1   1
    1   2
    2   2
    2   3
    3   3
    3   4
    4   5
    5   6

现在我想生成主列C如下：

这通常是从中间id更新a列和b列的值（ID）。例如列a 在列b 中有1个对应值为1，现在搜索列b 中包含1的所有值并指定 > master id 1，同样在列a 中具有Id 1的另一行具有相应的列b = 2，因此在列b中搜索所有2 < / strong>并分配主ID。反之亦然。

我已经完成了以下代码，但它只进行了1次roatation：列a到列b和b到

masterCombine <- function(data, col1="a", col2="b", masterName="c"){ skipList <- NULL masterId <- 1 for( p in 1: nrow(data)){ ind <- ind1 <- ind2 <- ind3 <- ind4 <- NULL if(!p %in% skipList){ ind1 <- which(data[, col1] == data[, col1][p]) for( ij in ind1){ ind2 <- which(data[ ,col2] == data[ ,col2][ij]) for(j in ind2){ ind3<- which(data[ , col1] == data[ ,col1][j]) ind4 <- append(ind4, ind3) } } ind <- unique(append(ind1,ind4)) skipList <- append(skipList, ind) data[ind, masterName] <- masterId masterId <- masterId + 1 } } return(data) }

如何实现此递归匹配？

Answer 1

您可以使用igraph包及其clusters()功能执行此类操作。您只需确保首先将a列中的值明确记录到列b值中。

library(igraph)
data <- data.frame(a=c(1,1,2,2,3,3,4,5), b=c(1,2,2,3,3,4,5,6))
newdata <- mapply(paste0, names(data), data)
g <- graph.edgelist(newdata)
clusters(g)$membership
#a1 b1 b2 a2 b3 a3 b4 a4 b5 a5 b6 
# 1  1  1  1  1  1  1  2  2  3  3 

cg <- clusters(g)$membership
data$c <- cg[match(newdata[,"a"],names(V(g)))]

#  a b c
#1 1 1 1
#2 1 2 1
#3 2 2 1
#4 2 3 1
#5 3 3 1
#6 3 4 1
#7 4 5 2
#8 5 6 3

对于视觉民谣，这里是plot(g)

的图形表示

从两个Id列

1 个答案: