我在Neo4j中有很多互相引用的论文。
数据如下:
{"title": "TitleWave", "year": 2010, "references": ["002", "003"], "id": "001"}
{"title": "Title002", "year": 2005, "references": ["003", "004"], "id": "002"}
{"title": "RealTitle", "year": 2000, "references": ["004", "001"], "id": "003"}
{"title": "Title004", "year": 2014, "references": ["001", "002"], "id": "004"}
我通过以下方式创建了关系:
CALL apoc.load.json('file.txt') YIELD value AS q
MERGE (p:Paper {id:q.id}) ON CREATE SET
p.title=q.title,
p.refs = q.references
WITH p
MATCH (p) UNWIND p.refs AS ref
MATCH (p2:Paper {id: ref})
MERGE (p)-[:CITES]->(p2);
我想运行algo.PageRank.stream
函数来获取一堆pagerank分数,然后将其标准化以用于大数据集。我可以在一个查询中有效地做到这一点吗?
这可以运行pagerank算法,但不能规范化:
CALL algo.pageRank.stream(
'MATCH (p:Paper) WHERE p.year < 2015 RETURN id(p) as id',
'MATCH (p1:Paper)-[:CITES]->(p2:Paper) RETURN id(p1) as source, id(p2) as target',
{graph:'cypher', iterations:20, write:false, concurrency:20})
YIELD node, score
WITH *,
node.title AS title,
score AS page_rank,
log(score) AS impact,
ORDER BY impact DESC
LIMIT 100
RETURN title, page_rank, impact;
是否有一种很好的方法来规范查询中的所有这些影响值?例如,一种标准化的方法是除以最大值。
但是,当我尝试这样做时:
CALL algo.pageRank.stream(
'MATCH (p:Paper) WHERE p.year < 2015 RETURN id(p) as id',
'MATCH (p1:Paper)-[:CITES]->(p2:Paper) RETURN id(p1) as source, id(p2) as target',
{graph:'cypher', iterations:20, write:false, concurrency:20})
YIELD node, score
WITH *,
node.title AS title,
score AS page_rank,
log(score) AS impact,
max(log(score)) as max_val,
impact / max_val as impact_norm
ORDER BY impact_norm DESC
LIMIT 100
RETURN title, page_rank, impact_norm;
我得到一个错误:
Variable `impact` not defined (line 18, column 1 (offset: 539))
"impact / max_val as impact_norm"
任何建议将不胜感激!