使用R确定所有可能的唯一二元组合中的共享逻辑值的数量

时间:2017-09-21 03:11:25

标签: r dataframe social-networking

我有一个数据框,其中包含评估它们是否位于每个区域的组和逻辑向量。

# Create data frame
Group = c('Group1', 'Group2', 'Group3', 'Group4') 
Area1 = c(TRUE, FALSE, TRUE, FALSE) 
Area2 = c(TRUE, TRUE, FALSE, FALSE) 
Area3 = c(FALSE, TRUE, FALSE, FALSE) 
Area4 = c(FALSE, FALSE, FALSE, TRUE) 
df = data.frame(Group, Area1, Area2, Area3, Area4) 

# Generate unique combinations of Groups
links <- expand.grid(df$Group, df$Group) #generates all possible combination
links$key <- apply(links, 1, function(x)paste(sort(x), collapse='')) 
undirected <- subset(links, !duplicated(links$key)) 
undirected$ID <- seq.int(nrow(undirected))

对于每个独特的群体,我试图确定他们共享的区域数量。我想要的输出是二元组,它们共享的区域数量以及区域的名称。

# Desired Output
Group1Group2  1 Area2
Group1Group3  1 Area1
Group1Group4  0 NA
Group2Group3  0 NA
Group2Group4  2 Area3, Area4
Group3Group4  0

1 个答案:

答案 0 :(得分:0)

我不确定我是否理解你的问题。数据结构令人困惑。标题为{i=2, j=4}的二元Group2Group4是否真的有第3区和第4区?我想不会。

我不确定这里真的需要igraph。但是,此 可以设置为双网网络,例如G(V₁,V₂,E)areas ∈ V₁groups ∈ V₂区分开来,并且让dyads始终从一个区域到另一个组运行:{{1 }}。然后,您可以通过列出每个组节点的邻域来获得共享区域,并通过计算其度数来获得仇恨区域的数量。

如果您真的非常希望在代码中看到这一点,我会在有空的时候重新发布。

与此同时,我认为,这就是你喜欢的。我没有在这场比赛中赢得任何代码高尔夫比赛,但如果我正确理解你的问题,那就完成了工作:

eⁱʲ ∈ E; i ∈ V₁; j ∈ V₂

如果你愿意的话,继续前进并清理到一个无向结构,也许是自动链接(我不会觉得你感兴趣的是Group2Group2)。

请注意,您的代码中途已经过了一半。特别是# Make that same data Group = c('Group1', 'Group2', 'Group3', 'Group4') Area1 = c(TRUE, FALSE, TRUE, FALSE) Area2 = c(TRUE, TRUE, FALSE, FALSE) Area3 = c(FALSE, TRUE, FALSE, FALSE) Area4 = c(FALSE, FALSE, FALSE, TRUE) df = data.frame(Group, Area1, Area2, Area3, Area4) # Take two groups (by number) and list the areas they have in common is.shared <- function(i, j){ # Make a dataframe with two rows (one for i and one for j) where # The order of the areas are multiplied with the boolean that indicates # if the group resides in area x. If so, set x, if not, set 0. dyad <- as.data.frame(matrix(rep(2:ncol(df)-1,2), nrow=2, byrow=T)) * df[c(i,j),2:5] # The shared areas is the intersection of the two sets shared.areas <- intersect(as.numeric(dyad[1,]), as.numeric(dyad[2,])) } # Take a vector of area-numbers and return a string that lists them. # c(2,4,0) becomes "Area2, Area4". list.areas <- function(vector){ result = c() for(area in vector){ if(area != 0){ result <- c(result, paste("Area", area, sep="")) } } paste(result, collapse=", ") } # Make a matrix of all possible dyadic combinations (two-way) dyads <- expand.grid(1:nrow(df), 2:ncol(df)-1) names(dyads) <- c("Group i", "Group j") # Each row contains a dyad - a pair (i, and j) of groups. # Generate a unique dyadic key dyads$Key <- apply(dyads, 1, function(x) paste(sort(x), collapse='->')) # For each row of dyads, that is to say, for each pair (i,j), check if # any areas are shared using is.shared(), and convert the result to a # string using list.areas() dyads$Shared_Areas <- sapply(1:nrow(dyads), function(x) list.areas(is.shared(dyads[x,1], dyads[x,2]) ) ) # Count the number of shared areas by splitting the string by commas dyads$Shared_Area_Nums <- sapply(dyads$Shared_Areas, function(x) length(strsplit(x,",")[[1]]) ) # Not that it's not as safe to count the result of is.shared() directly. # If two groups share ALL areas with each other, no 0 will be returned in # the vector. If we asume that no two groups reside in all areas, it would # also be ok to generate dyad$Shared_Areas like this: dyads$Shared_Areas_Unsafe <- sapply(1:nrow(dyads), function(x) length(is.shared(dyads[x,1], dyads[x,2])) ) - 1 # Rename columns dyads <- dyads[,c("Group i","Group j", "Key", "Shared_Area_Nums", "Shared_Areas_Unsafe", "Shared_Areas")] 很整洁。