Question

我正在使用Ubuntu 12.04。假设我们在Neo4j中加载了一堆节点和边缘如何运行像PageRank这样的算法？任何套餐？

Answer 1

我出于同样的原因环顾网络。

据我所知，有一些方法可以实现它：

1）用Java编写自己的Neo4J插件：

默认情况下，您可以获得一些算法（http://docs.neo4j.org/chunked/stable/rest-api-graph-algos.html），但任何高级选项最好都可以编写它。这里的参考：http://docs.neo4j.org/chunked/stable/server-plugins.html

2）将Gremlin插件与Blueprints / Furnace一起使用。

这正是我现在正在尝试的（更确切地说是社区检测算法）。尝试https://github.com/tinkerpop/gremlin/wiki和https://github.com/tinkerpop/furnace。它应该通过Neo4j服务器中的Gremlin插件或通过某些API工作。我是一个蟒蛇迷，所以我正在尝试灯泡或pyblueprints。我不能告诉你现在哪个更好。

3）查询图表并将其加载到已知框架

使用Python，C，R实现它有数千种方法...例如，我建议使用networkx（python）或igraph（R）。例如：https://networkx.github.io/documentation/latest/reference/generated/networkx.algorithms.link_analysis.pagerank_alg.pagerank.html

希望它有所帮助。

Answer 2

查看GraphAware Framework，即timer-driven runtime modules。

基本上，框架允许您编写全局图算法（如PageRank）并让它们在图形数据库上连续计算。当您的数据库忙于正常的事务处理并在安静时段加速时，计算速度会变慢。

我们正在自己的PageRank Module工作;它仍在进行中，但可能对您有所帮助。

免责声明：我是其中一位框架作者。

Answer 3

一种奇特的方法是依靠Web规模的快速处理平台，可以将大量任务划分为并行子任务。完成这一愿景的最有效方法是使用图形并行处理引擎，例如Spark GraphX。

Kenny Bastani的Mazerunner是一个强大的工具，通过并行图处理增强了Neo4J。子图凸轮因此可以从Neo4j导出到Spark，然后分发处理然后在原始GraphDB中重新导入

看看：

http://www.kennybastani.com/2015/01/categorical-pagerank-neo4j-spark.html?m=1

Answer 4

2014年的其他答案不再是最新的。

有一个已实现的PageRank算法。

CALL algo.pageRank.stream('Site', 'links', {iterations:20, dampingFactor:0.85})
YIELD nodeId, score
RETURN algo.getNodeById(nodeId).name AS page,score
ORDER BY score DESC

“站点”必须是站点/页面的标签，“链接”必须是站点之间的关系类型（与其他站点链接的站点）。

Answer 5

您可以在cypher中实现iterative algorithm，但必须运行几次才能获得正确的结果：

/* we have to go through all nodes */
match (node)
with
  collect(distinct node) as pages
unwind pages as dest
  /* let's find all source citations for a given node */
  match (source)-[:LINK]->(dest)
  with
    collect(distinct source) as sources,
    dest as dest
    unwind sources as src
      /* we have to know how many relationships the source node has */
      match (src)-[r:LINK]->()
      with
        src.pageRank / count(r) as points,
        dest as dest
      /* now we have all information to update the destination node with the new pagerank */
      with
        sum(points) as p,
        dest as dest
      set dest.pageRank = 0.15 + 0.85 * p;

要知道何时停止，在每次迭代后，您可以检查前几个值，如果它们停止大量更改，您可以停止迭代：

MATCH (source) RETURN source.pageRank ORDER BY source.pageRank DESC LIMIT 25;

对于我来说，在大约10.000个节点的互连图中，我必须运行它50次才能使迭代收敛。每次运行大约需要半分钟才能完成，因此速度非常慢，您可能需要检查一个插件解决方案，以防您需要更好的性能。

如何在Neo4j中运行PageRank？

5 个答案: