Neo4j批量数据 - 创建关系[OutOfMemory Exception]

时间:2017-01-28 05:49:39

标签: java neo4j

我正在使用Neo4j过程在批量数据上创建关系。

最初使用load csv。

插入所有数据
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:///XXXX.csv" AS row 
....

数据大小太大[10M]但已成功执行

我的问题是我想在多个节点之间建立关系

但我在执行查询时遇到异常[OutMemoryException]

MATCH(n1:x{REMARKS :"LATEST"}) MATCH(n2:x{REMARKS :"LATEST"}) WHERE n1.DIST_ID=n2.ENROLLER_ID CREATE (n1)-[:ENROLLER]->(n2) ;

我已经创建了索引和约束

任何想法请帮助我?

1 个答案:

答案 0 :(得分:1)

问题是您的查询是在一个事务中执行的,这会导致异常[OutMemoryException]。这是一个问题,因为此时定期交易的可能性只需要加载CSV。因此,您可以在首次加载后重新读取CSV:

USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:///XXXX.csv" AS row 
MATCH (n1:x{REMARKS :"LATEST", DIST_ID: row.DIST_ID})
WITH n1
MATCH(n2:x{REMARKS :"LATEST"}) WHERE n1.DIST_ID=n2.ENROLLER_ID 
CREATE (n1)-[:ENROLLER]->(n2) ;

或者使用APOC library中的periodic committing来尝试这个技巧:

call apoc.periodic.commit("
    MATCH (n2:x {REMARKS:'Latest'}) WHERE exists(n2.ENROLLER_ID)
    WITH n2 LIMIT {perCommit}
    OPTIONAL MATCH (n1:x {REMARKS:'Latest'}) WHERE n1.DIST_ID = n2.ENROLLER_ID
    WITH n2, collect(n1) as n1s
    FOREACH(n1 in n1s|
       CREATE (n1)-[:ENROLLER]->(n2)
    )
    REMOVE n2.ENROLLER_ID
    RETURN count(n2)", 
    {perCommit: 1000}
)

P.S。 ENROLLER_ID属性用作选择要处理的节点的标志。当然,您可以使用另一个标志,该标志在处理中设置。

apoc.periodic.iterate更准确:

CALL apoc.periodic.iterate("
    MATCH (n1:x {REMARKS:'Latest'})
    MATCH (n2:x {REMARKS:'Latest'}) WHERE n1.DIST_ID = n2.ENROLLER_ID
    RETURN n1,n2
  ","
    WITH {n1} as n1, {n2} as n2 
    MERGE (n1)-[:ENROLLER]->(n2)
  ", {batchSize:10000, parallel:true}
)