pagerank Networkx与iGraph

时间:2017-02-10 13:29:25

标签: python igraph networkx pagerank

我们正在尝试将python应用程序移植到.Net / Windows。原始应用程序使用pagerank的NetworkX实现。

当我们运行下面数据集的原始代码时,我们得到一组结果,当我们使用iGraph pagerank运行我们认为相同的数据集时,我们得到不同的结果集。

任何人都可以查看下面的数据并告诉我们可能导致断开连接的原因吗?

开始图

From , To, Weight
------------------------------------------
[1, 2, 1.237635735532509]
[1, 3, 1.3176784432060453]
[2, 5, 0.1]
[2, 7, 1.6545276334003642]
[3, 0, 0.4013877113318902]
[3, 5, 0.9056698458264134]
[3, 7, 3.4462871026284194]
[4, 5, 0.9693717489378296]
[4, 7, 1.3176784432060453]
[5, 7, 1.6053605156578263]
[7, 2, 0.8068528194400547]
[7, 3, 0.9771288098085582]
[7, 4, 4.317678443206045]
[7, 5, 2.0108256237659905]

使用NetworkX运行Pagerank的结果

0: 0.030658861877660655
1: 0.025151437717922904
2: 0.06899335192504014
3: 0.0767301059609998
4: 0.20435115331218195
5: 0.19799952556413375
7: 0.39611556364206074

使用iGraph运行Pagerank

h = Graph()
h.add_vertices([0,1,2,3,4,5,6,7])
h.add_edge(1, 2, weight = 1.237635735532509)
h.add_edge(1, 3, weight = 1.3176784432060453)
h.add_edge(2, 5, weight = 0.1)
h.add_edge(2, 7, weight = 1.6545276334003642)
h.add_edge(3, 0, weight = 0.4013877113318902)
h.add_edge(3, 5, weight = 0.9056698458264134)
h.add_edge(3, 7, weight = 3.4462871026284194)
h.add_edge(4, 5, weight = 0.9693717489378296)
h.add_edge(4, 7, weight = 1.3176784432060453)
h.add_edge(5, 7, weight = 1.6053605156578263)
h.add_edge(7, 2, weight = 0.8068528194400547)
h.add_edge(7, 3, weight = 0.9771288098085582)
h.add_edge(7, 4, weight = 4.317678443206045)                   
h.add_edge(7, 5, weight = 2.0108256237659905)

z = h.pagerank()

...返回

0.08263947646845539 
0.11209944263156851 
0.13863513488523824 
0.2088786898834253 
0.0909928717668216 
0.15533634946784883 
0.04713009918827309 
0.16428793570836897

pagerank(None,True,.85,'weight',None,'prpack',1000,.001)返回,

0.06306529189761995
0.1213272777521786
0.12419698504275958
0.21601479253860403
0.0845752983652644
0.10892203451714054
0.05260867410276095
0.22928964578367186

pagerank(None,True,.85,'weight',None,'power',1000,.001)返回,

0.05046861007484653
0.08032641955693953
0.1387381559084609
0.18249744338552665
0.10389267832310527
0.16623355776440546
0.019058750577540366
0.2587843844091753

非常感谢您提供的任何指导。

1 个答案:

答案 0 :(得分:2)

导致页面排名差异的因素有很多。首先,networkx图没有节点6(隔离),但igraph图表没有。其次,确保igraph图表是定向的。当你这样做时,页面排名得分几乎相同(至少在第6个小数位左右)。

import igraph as ig
import networkx as nx
G=nx.DiGraph()
G.add_nodes_from([0,1,2,3,4,5,6,7]) #Add node 6
G.add_edge(1, 2,weight= 1.237635735532509)
G.add_edge(1, 3,weight= 1.3176784432060453)
G.add_edge(2, 5,weight= 0.1)
G.add_edge(2, 7,weight= 1.6545276334003642)
G.add_edge(3, 0,weight= 0.4013877113318902)
G.add_edge(3, 5,weight= 0.9056698458264134)
G.add_edge(3, 7,weight= 3.4462871026284194)
G.add_edge(4, 5,weight= 0.9693717489378296)
G.add_edge(4, 7,weight= 1.3176784432060453)
G.add_edge(5, 7,weight= 1.6053605156578263)
G.add_edge(7, 2,weight= 0.8068528194400547)
G.add_edge(7, 3,weight= 0.9771288098085582)
G.add_edge(7, 4,weight= 4.317678443206045)
G.add_edge(7, 5,weight= 2.0108256237659905)

h = ig.Graph(directed = True)  #Ensure the graph is directed
h.add_vertices([0,1,2,3,4,5,6,7])
h.add_edge(1, 2, weight = 1.237635735532509)
h.add_edge(1, 3, weight = 1.3176784432060453)
h.add_edge(2, 5, weight = 0.1)
h.add_edge(2, 7, weight = 1.6545276334003642)
h.add_edge(3, 0, weight = 0.4013877113318902)
h.add_edge(3, 5, weight = 0.9056698458264134)
h.add_edge(3, 7, weight = 3.4462871026284194)
h.add_edge(4, 5, weight = 0.9693717489378296)
h.add_edge(4, 7, weight = 1.3176784432060453)
h.add_edge(5, 7, weight = 1.6053605156578263)
h.add_edge(7, 2, weight = 0.8068528194400547)
h.add_edge(7, 3, weight = 0.9771288098085582)
h.add_edge(7, 4, weight = 4.317678443206045)                   
h.add_edge(7, 5, weight = 2.0108256237659905)

现在,检查页面排名:

>>> h.pagerank(None,True,.85,'weight',None,'arpack')
[0.02990667328959136,
 0.02453435976169968,
 0.06730062757414129,
 0.07484756185358077,
 0.199337429914656,
 0.19314195829041825,
 0.02453435976169968,
 0.38639702955421296]
>>> nx.pagerank(G,alpha=0.85,weight = 'weight')
{0: 0.029906698992551148,
 1: 0.02453435614919296,
 2: 0.06730055444151634,
 3: 0.07484747242070261,
 4: 0.19933699472630276,
 5: 0.19314246522466136,
 6: 0.02453435614919296, #Here is node 6, missing from your example
 7: 0.3863971018958797}

对我来说,一个谜是networkx的文档说它使用了幂方法。但是,使用igraph的幂方法会产生不同的结果。使用arpackprpack会产生大致相似的结果。