如何规范PageRank得分

时间:2018-10-30 22:56:50

标签: neo4j cypher pagerank

我正在一组节点上运行PageRank,其中每个节点都具有属性year。如何根据year属性计算所有PageRank得分的平均值?也就是说,如果有100个节点,总共有20个不同的year值,我想计算20个平均PageRank值。

然后,对于每个节点,我想根据该年的PageRank得分和论文平均PageRank得分之间的差值(其中该年的平均值基于year属性具有相同值的所有节点。

运行PageRank的代码是: CALL algo.pageRank.stream( 'MATCH (p:Paper) WHERE p.year < 2015 RETURN id(p) as id', 'MATCH (p1:Paper)-[:CITES]->(p2:Paper) RETURN id(p1) as source, id(p2) as target', {graph:'cypher', iterations:20, write:false, concurrency:20}) YIELD node, score WITH *, node.title AS title,
node.year AS year, score AS page_rank ORDER BY page_rank DESC LIMIT 10000 RETURN title, year, page_rank;

如何更改此代码以返回标度得分?

非常感谢您的帮助!

1 个答案:

答案 0 :(得分:1)

此查询应为每个scaled_score / year组合返回title(作为绝对值)(标度分数越低,标题的page_rank就越近到该年的平均值):

CALL algo.pageRank.stream(
  'MATCH (p:Paper) WHERE p.year < 2015 RETURN id(p) as id',
  'MATCH (p1:Paper)-[:CITES]->(p2:Paper) RETURN id(p1) as source, id(p2) as target',
  {graph:'cypher', iterations:20, write:false, concurrency:20})
YIELD node, score
WITH 
  node.title AS title,
  node.year AS year, 
  score AS page_rank
ORDER BY page_rank DESC
LIMIT 10000
WITH year, COLLECT({title: title, page_rank: page_rank}) AS data, AVG(page_rank) AS avg_page_rank
UNWIND data AS d
RETURN year, d.title AS title, ABS(d.page_rank-avg_page_rank)/avg_page_rank AS scaled_score;

您可能还想对结果进行排序(例如,按yearscaled_score)。