Cypher如何规范Pagerank分数

时间:2018-07-24 19:54:12

标签: neo4j cypher data-science neo4j-apoc

我在Neo4j中有很多互相引用的论文。

数据如下:

{"title": "TitleWave", "year": 2010, "references": ["002", "003"], "id": "001"}
{"title": "Title002", "year": 2005, "references": ["003", "004"], "id": "002"}
{"title": "RealTitle", "year": 2000,  "references": ["004", "001"], "id": "003"}
{"title": "Title004", "year": 2014, "references": ["001", "002"], "id": "004"}

我通过以下方式创建了关系:

CALL apoc.load.json('file.txt') YIELD value AS q
MERGE (p:Paper {id:q.id}) ON CREATE SET 
p.title=q.title, 
p.refs = q.references
WITH p
MATCH (p) UNWIND p.refs AS ref
MATCH (p2:Paper {id: ref})
MERGE (p)-[:CITES]->(p2);

我想运行algo.PageRank.stream函数来获取一堆pagerank分数,然后将其标准化以用于大数据集。我可以在一个查询中有效地做到这一点吗?

这可以运行pagerank算法,但不能规范化:

CALL algo.pageRank.stream(
'MATCH (p:Paper) WHERE p.year < 2015 RETURN id(p) as id',
'MATCH (p1:Paper)-[:CITES]->(p2:Paper) RETURN id(p1) as source, id(p2) as target',
{graph:'cypher', iterations:20, write:false, concurrency:20})
YIELD node, score
WITH *,
node.title AS title,  
score AS page_rank,
log(score) AS impact,
ORDER BY impact DESC
LIMIT 100
RETURN title, page_rank, impact;

是否有一种很好的方法来规范查询中的所有这些影响值?例如,一种标准化的方法是除以最大值。

但是,当我尝试这样做时:

CALL algo.pageRank.stream(
'MATCH (p:Paper) WHERE p.year < 2015 RETURN id(p) as id',
'MATCH (p1:Paper)-[:CITES]->(p2:Paper) RETURN id(p1) as source, id(p2) as target',
{graph:'cypher', iterations:20, write:false, concurrency:20})
YIELD node, score
WITH *,
node.title AS title, 
score AS page_rank,
log(score) AS impact,
max(log(score)) as max_val,
impact / max_val as impact_norm
ORDER BY impact_norm DESC
LIMIT 100
RETURN title, page_rank, impact_norm;

我得到一个错误:

Variable `impact` not defined (line 18, column 1 (offset: 539))
"impact / max_val as impact_norm"

任何建议将不胜感激!

0 个答案:

没有答案