如何通过pysprak运行个性化的网页排名?

时间:2017-07-05 15:39:36

标签: python pyspark pagerank

我想将节点1设置为启动节点以实现个性化页面排名。我该如何添加起始节点?

import networkx as nx
from operator import add

link_data = {
    0: [1, 2],
    1: [2, 6],
    2: [1, 0],
    3: [1, 0],
    4: [1],
    5: [0, 1],
    6: [0, 7],
    7: [0, 1, 2, 3, 9],
    8: [5, 9],
    9: [7]
}
link_graph = nx.DiGraph(link_data)

ranks = sc.range(len(link_data)).map(lambda x : (x, 1.))
links = sc.parallelize(link_data.items()).cache()
links.join(ranks).collect()

def computeContribs(node_urls_rank):
    _, (urls, rank) = node_urls_rank
    nb_urls = len(urls)
    for url in urls:
        yield url, rank / nb_urls

for iteration in range(10):
    contribs = links.join(ranks).flatMap(computeContribs)
    contribs = links.fullOuterJoin(contribs).mapValues(lambda x : x[1] or 0.0)
    ranks = contribs.reduceByKey(add)
    ranks = ranks.mapValues(lambda rank: rank * 0.85 + 0.15)

for (link, rank) in sorted(ranks.collect()):
    print("%s has rank: %s." % (link, rank / len(link_data)))

有人可以帮助我吗?我应该在哪里设置sourceId个性化该顶点的结果。谢谢!!

0 个答案:

没有答案