计算每对列的相同行值以创建网络图

时间:2016-04-27 20:21:23

标签: r igraph

我有这样的数据:

dat <- data.frame(
  music = c("classical", "jazz", "baroque", "electronic", "ambient"),
  john = c(1,1,0,1,1),
  jeff = c(1,0,0,1,0),
  jane = c(0,1,1,0,0)
)

       music john jeff jane
1  classical    1    1    0
2       jazz    1    0    1
3    baroque    0    0    1
4 electronic    1    1    0
5    ambient    1    0    0

并且想要绘制列上各个人之间的重叠图 - 他们在同一行中有多少次1?如果我能够进入data.frame

result <- data.frame(person1 = c("john", "john", "jeff"), person2 = c("jeff", "jane", "jane"), overlap = c(2, 1, 0))

  person1 person2 overlap
1    john    jeff       2
2    john    jane       1
3    jeff    jane       0

我可以创建我想到的图表:

library(igraph)
g <- graph.data.frame(result, directed = FALSE)
plot(g, edge.width = result$overlap * 3)

但我正在努力转换数据以计算每对列之间的行方向重叠。我怎么能这样做?

3 个答案:

答案 0 :(得分:4)

可能更简单的方法是通过采用交叉积来创建图的邻接矩阵。然后,您可以直接阅读igraph。

library(igraph)

# Take the crossproduct: assumes unique music types in each row
# otherwise aggregate terms
m <- crossprod(as.matrix(dat[-1]))

# You could remove the diagonal terms here
# although it is useful to see the sum for each individual
# You can also remove it in igraph, as below
# diag(m) <- 0

# Create graph
# The weights are stored in E(g)$weight
g <- graph_from_adjacency_matrix(m, mode="undirected", weighted = TRUE)

# Remove edge loops
g <- simplify(g)

答案 1 :(得分:2)

也许你想尝试不同的相似/距离测量,如Russel / Roa,Jaccard等。我的意思是:0和0也可以被解释为相似性。无论如何,这是另一种方法:

library(proxy)
m <- (1-as.matrix(dist( t(dat[, -1]), method = "Russel")))*nrow(dat)
m[lower.tri(m, T)] <- NA
(res <- setNames(reshape2::melt(m, na.rm=T), c("p1", "p2", "ol")))
#     p1   p2 ol
# 4 john jeff  2
# 7 john jane  1
# 8 jeff jane  0

答案 2 :(得分:1)

以下适用于您的示例:

# build name matrix
nameMat <- t(combn(names(dat[,-1]), 2))
# pre-allocate count vector
overLap <- integer(nrow(nameMat))

# loop through name combos
for(i in 1:nrow(nameMat)) {
  overLap[i] <- sum(rowSums(dat[, nameMat[i,]]) == 2)
}
# construct data.frame
df <- data.frame("person1"=nameMat[,1], "person2"=nameMat[,2], "overLap"=overLap)

如果您不喜欢for个循环,可以使用sapply来获取重叠次数:

overLap <- sapply(1:(nrow(nameMat)), 
                  function(i) sum(rowSums(dat[, nameMat[i,]]) == 2))

根据@ user20650的建议,您还可以使用combn

计算重叠
overLap <- combn(dat[-1], 2, FUN=function(i) sum(rowSums(i)==2))

计算重叠的长方法如下:     overLap&lt; - sapply(1:(nrow(nameMat)),function(i)sum(rowSums(dat [,nameMat [i,]] == c(1,1))== 2))

这个较长的版本具有一个优点,因为它可以推广到这些度量是相似规模的情况(表示亲和力的强度)。在5分制的情况下,c(1,1)可以改为c(3,3)来检查无差异或c(5,5)。如果兴趣是极端相反的意见,例如c(1,5),则必须操纵和复制nameMat:

newNameMat <- rbind(nameMat, cibind(nameMat[,2], nameMat[,1])

并对此矩阵执行计算。将这些操作包装成一个可以计算任意比例的Likert比例组合的函数是不会太难的。