我正在使用Neo4j过程在批量数据上创建关系。
最初使用load csv。
插入所有数据USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:///XXXX.csv" AS row
....
数据大小太大[10M]但已成功执行
我的问题是我想在多个节点之间建立关系
但我在执行查询时遇到异常[OutMemoryException]
MATCH(n1:x{REMARKS :"LATEST"}) MATCH(n2:x{REMARKS :"LATEST"}) WHERE n1.DIST_ID=n2.ENROLLER_ID CREATE (n1)-[:ENROLLER]->(n2) ;
我已经创建了索引和约束
任何想法请帮助我?
答案 0 :(得分:1)
问题是您的查询是在一个事务中执行的,这会导致异常[OutMemoryException]
。这是一个问题,因为此时定期交易的可能性只需要加载CSV。因此,您可以在首次加载后重新读取CSV:
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:///XXXX.csv" AS row
MATCH (n1:x{REMARKS :"LATEST", DIST_ID: row.DIST_ID})
WITH n1
MATCH(n2:x{REMARKS :"LATEST"}) WHERE n1.DIST_ID=n2.ENROLLER_ID
CREATE (n1)-[:ENROLLER]->(n2) ;
或者使用APOC library
中的periodic committing来尝试这个技巧:
call apoc.periodic.commit("
MATCH (n2:x {REMARKS:'Latest'}) WHERE exists(n2.ENROLLER_ID)
WITH n2 LIMIT {perCommit}
OPTIONAL MATCH (n1:x {REMARKS:'Latest'}) WHERE n1.DIST_ID = n2.ENROLLER_ID
WITH n2, collect(n1) as n1s
FOREACH(n1 in n1s|
CREATE (n1)-[:ENROLLER]->(n2)
)
REMOVE n2.ENROLLER_ID
RETURN count(n2)",
{perCommit: 1000}
)
P.S。 ENROLLER_ID
属性用作选择要处理的节点的标志。当然,您可以使用另一个标志,该标志在处理中设置。
或apoc.periodic.iterate
更准确:
CALL apoc.periodic.iterate("
MATCH (n1:x {REMARKS:'Latest'})
MATCH (n2:x {REMARKS:'Latest'}) WHERE n1.DIST_ID = n2.ENROLLER_ID
RETURN n1,n2
","
WITH {n1} as n1, {n2} as n2
MERGE (n1)-[:ENROLLER]->(n2)
", {batchSize:10000, parallel:true}
)