我有一个数据框,其中包含评估它们是否位于每个区域的组和逻辑向量。
# Create data frame
Group = c('Group1', 'Group2', 'Group3', 'Group4')
Area1 = c(TRUE, FALSE, TRUE, FALSE)
Area2 = c(TRUE, TRUE, FALSE, FALSE)
Area3 = c(FALSE, TRUE, FALSE, FALSE)
Area4 = c(FALSE, FALSE, FALSE, TRUE)
df = data.frame(Group, Area1, Area2, Area3, Area4)
# Generate unique combinations of Groups
links <- expand.grid(df$Group, df$Group) #generates all possible combination
links$key <- apply(links, 1, function(x)paste(sort(x), collapse=''))
undirected <- subset(links, !duplicated(links$key))
undirected$ID <- seq.int(nrow(undirected))
对于每个独特的群体,我试图确定他们共享的区域数量。我想要的输出是二元组,它们共享的区域数量以及区域的名称。
# Desired Output
Group1Group2 1 Area2
Group1Group3 1 Area1
Group1Group4 0 NA
Group2Group3 0 NA
Group2Group4 2 Area3, Area4
Group3Group4 0
答案 0 :(得分:0)
我不确定我是否理解你的问题。数据结构令人困惑。标题为{i=2, j=4}
的二元Group2Group4
是否真的有第3区和第4区?我想不会。
我不确定这里真的需要igraph
。但是,此 可以设置为双网网络,例如G(V₁,V₂,E)
将areas ∈ V₁
与groups ∈ V₂
区分开来,并且让dyads始终从一个区域到另一个组运行:{{1 }}。然后,您可以通过列出每个组节点的邻域来获得共享区域,并通过计算其度数来获得仇恨区域的数量。
如果您真的非常希望在代码中看到这一点,我会在有空的时候重新发布。
与此同时,我认为,这就是你喜欢的。我没有在这场比赛中赢得任何代码高尔夫比赛,但如果我正确理解你的问题,那就完成了工作:
eⁱʲ ∈ E; i ∈ V₁; j ∈ V₂
如果你愿意的话,继续前进并清理到一个无向结构,也许是自动链接(我不会觉得你感兴趣的是Group2Group2)。
请注意,您的代码中途已经过了一半。特别是# Make that same data
Group = c('Group1', 'Group2', 'Group3', 'Group4')
Area1 = c(TRUE, FALSE, TRUE, FALSE)
Area2 = c(TRUE, TRUE, FALSE, FALSE)
Area3 = c(FALSE, TRUE, FALSE, FALSE)
Area4 = c(FALSE, FALSE, FALSE, TRUE)
df = data.frame(Group, Area1, Area2, Area3, Area4)
# Take two groups (by number) and list the areas they have in common
is.shared <- function(i, j){
# Make a dataframe with two rows (one for i and one for j) where
# The order of the areas are multiplied with the boolean that indicates
# if the group resides in area x. If so, set x, if not, set 0.
dyad <- as.data.frame(matrix(rep(2:ncol(df)-1,2), nrow=2, byrow=T)) * df[c(i,j),2:5]
# The shared areas is the intersection of the two sets
shared.areas <- intersect(as.numeric(dyad[1,]), as.numeric(dyad[2,]))
}
# Take a vector of area-numbers and return a string that lists them.
# c(2,4,0) becomes "Area2, Area4".
list.areas <- function(vector){
result = c()
for(area in vector){
if(area != 0){
result <- c(result, paste("Area", area, sep=""))
}
}
paste(result, collapse=", ")
}
# Make a matrix of all possible dyadic combinations (two-way)
dyads <- expand.grid(1:nrow(df), 2:ncol(df)-1)
names(dyads) <- c("Group i", "Group j")
# Each row contains a dyad - a pair (i, and j) of groups.
# Generate a unique dyadic key
dyads$Key <- apply(dyads, 1, function(x) paste(sort(x), collapse='->'))
# For each row of dyads, that is to say, for each pair (i,j), check if
# any areas are shared using is.shared(), and convert the result to a
# string using list.areas()
dyads$Shared_Areas <- sapply(1:nrow(dyads), function(x)
list.areas(is.shared(dyads[x,1], dyads[x,2]) )
)
# Count the number of shared areas by splitting the string by commas
dyads$Shared_Area_Nums <- sapply(dyads$Shared_Areas, function(x)
length(strsplit(x,",")[[1]])
)
# Not that it's not as safe to count the result of is.shared() directly.
# If two groups share ALL areas with each other, no 0 will be returned in
# the vector. If we asume that no two groups reside in all areas, it would
# also be ok to generate dyad$Shared_Areas like this:
dyads$Shared_Areas_Unsafe <- sapply(1:nrow(dyads), function(x)
length(is.shared(dyads[x,1], dyads[x,2]))
) - 1
# Rename columns
dyads <- dyads[,c("Group i","Group j", "Key", "Shared_Area_Nums",
"Shared_Areas_Unsafe", "Shared_Areas")]
很整洁。