我有一个data.frame
,它描述了一个包含非常大(数百万)和相当小(数百)个独立集的二分图。
我想在较小的独立集上得到图的二分投影,但是没有首先创建大的二分图,特别是对更大的独立集的巨大二分投影。这种限制的原因是igraph
segfault和RAM限制(我只有8GB RAM)。
,例如
data.frame(beg=c("a","a","b","b","c","c"),
end=c("1","2","1","2","1","2"),
weight=1:6)
我想要数据框
data.frame(beg=c("a","a","b"),
end=c("b","c","c"),
weight=c(1+3+2+4,1+5+2+6,3+5+4+6))
边缘的权重加起来。
(在此示例中,abc
是“较小”的集合,12
是“较大的”集合。
答案 0 :(得分:2)
这似乎是我需要做的事情(关键是使用data.table
进行快速加入):
> library(igraph)
> library(data.table)
data.table 1.8.8 For help type: help("data.table")
> f <- data.frame(beg=c("a","a","b","b","c","c"),
end=c("1","2","1","2","1","2"),
count=1:6)
> f
beg end count
1: a 1 1
2: b 1 3
3: c 1 5
4: a 2 2
5: b 2 4
6: c 2 6
> m <- f[f,allow.cartesian=TRUE]
> m
end beg weight beg.1 weight.1
1: 1 a 1 a 1
2: 1 b 3 a 1
3: 1 c 5 a 1
4: 1 a 1 b 3
5: 1 b 3 b 3
6: 1 c 5 b 3
7: 1 a 1 c 5
8: 1 b 3 c 5
9: 1 c 5 c 5
10: 2 a 2 a 2
11: 2 b 4 a 2
12: 2 c 6 a 2
13: 2 a 2 b 4
14: 2 b 4 b 4
15: 2 c 6 b 4
16: 2 a 2 c 6
17: 2 b 4 c 6
18: 2 c 6 c 6
> v <- m$beg == m$beg.1
> m <- f[f,allow.cartesian=TRUE]
> v <- m$beg == m$beg.1
> m$end <- NULL
> m$weight <- (m$count + m$count.1)/2
> m$count <- NULL
> m$count.1 <- NULL
> m
beg beg.1 weight
1: a a 1
2: b a 2
3: c a 3
4: a b 2
5: b b 3
6: c b 4
7: a c 3
8: b c 4
9: c c 5
10: a a 2
11: b a 3
12: c a 4
13: a b 3
14: b b 4
15: c b 5
16: a c 4
17: b c 5
18: c c 6
> ve <- data.table(vertex=m$beg[v], weight=m$weight[v], key="vertex")
> ve <- ve[, list(count = .N, weight = sum(weight)), by = "vertex"]
> ve
vertex count weight
1: a 2 3
2: b 2 7
3: c 2 11
> g1 <- graph.data.frame(m[!v,], vertices=ve, directed=FALSE)
> g1 <- simplify(g1, edge.attr.comb="sum")
> V(g1)$weight
[1] 3 7 11
> E(g1)$weight
[1] 10 14 18
答案 1 :(得分:0)
所以这就是我如何做的(假设你的边缘是df,而“小”集是在边缘的开头)
对于小集合中的每对节点,我将使用以下内容:
do.pair = function(x,y) {
tmp = intersect(df$end[df$beg==x],df$end[df$beg==y])
res = sum(df$weight[(df$beg %in% c(x,y)) & (df$end %in% tmp)])
return(res)
}
现在,我以你最喜欢的方式创建对列表(你可以使用exapnd.grid或者外部),然后使用上面的相关apply函数,这里我只做一个简单的嵌套循环,效率不高但很容易阅读。
g.small = unique(df$beg)
n = length(g.small)
res = list()
cnt=0
for (i in 1:(n-1)) {
for (j in (i+1):n) {
cnt = cnt+1
res[[cnt]] = list(beg=g.small[i],end=g.small[j],weight=do.pair(g.small[i],g.small[j]))
}
}
do.call(rbind,res)