为什么Neo4j CSV加载程序不会增加大量记录的负载

时间:2017-11-29 15:15:38

标签: neo4j

我有一个包含以下列的csv文件 -

Child_Object_ID;
Child_Object_Name;
Child_Object_Type;
Parent_Object_ID;
Parent_Object_Name;
Parent_Object_Type

顾名思义,包含(Child_Object_ID Child_Object_Name and Child_Object_Type)的节点是(Parent_Object_ID Parent_Object_Name and Parent_Object_Type)的子节点。这些父节点可以是其他父节点的子节点。

此CSV文件包含110万条记录。我面临的问题是,在加载100K记录之后,我没有看到加载过程中的任何增量。但是加载过程一直在运行,但我没有看到任何进一步的节点或关系正在建立。

我正在使用以下Cypher查询将数据加载到Neo4j Windows版本中 -

CREATE INDEX ON :Object(Object_ID)
CREATE INDEX ON :Object(Object_ID, Object_Name, Object_Type)
CREATE INDEX ON :Object(Object_Type)

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///file1.csv" AS csvLine
MERGE  (object1:Object {Object_ID:csvLine.CHILD_OBJECT_ID, Object_Name:csvLine.CHILD_OBJECT_NAME, Object_Type:csvLine.CHILD_OBJECT_TYPE})
MERGE  (object2:Object {Object_ID:csvLine.PARENT_OBJECT_ID, Object_Name:csvLine.PARENT_OBJECT_NAME, Object_Type:csvLine.PARENT_OBJECT_TYPE})
MERGE (object1)-[:Child_Of]->(object2)

2 个答案:

答案 0 :(得分:0)

您的加载查询存在的问题是计划中存在Eager操作,这将阻止PERIODIC COMMIT批处理(您应该在查询输入框中看到此查询的警告,请查看警告消息)。

如果没有批处理,您的导入可能会遇到内存问题。

要避免急切操作,请尝试仅使用单个变量合并所有节点的导入查询。完成之后,运行一个使用MATCH同时为子节点和父节点(将与现有节点匹配)的负载,然后合并关系。

这是一个article(较旧,但仍适用)避免急切操作。

答案 1 :(得分:0)

以下是更新后的Cypher查询(基于InverseFalcon建议重新编写):

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///file1.csv" AS csvLine
MERGE  (object1:Object {Object_ID:csvLine.CHILD_OBJECT_ID, Object_Name:COALESCE(csvLine.CHILD_OBJECT_NAME, 'NA'), Object_Type:csvLine.CHILD_OBJECT_TYPE, Folder:COALESCE(csvLine.CHILD_FOLDER, 'NA')})


USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///file1.csv" AS csvLine
MERGE  (object2:Object {Object_ID:csvLine.PARENT_OBJECT_ID, Object_Name:COALESCE(csvLine.PARENT_OBJECT_NAME, 'NA'), Object_Type:csvLine.PARENT_OBJECT_TYPE, Folder:COALESCE(csvLine.PARENT_FOLDER, 'NA')})


USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///file1.csv" AS csvLine
MATCH (a:Object {Object_ID:csvLine.CHILD_OBJECT_ID})
MATCH (b:Object {Object_ID:csvLine.PARENT_OBJECT_ID})
MERGE (a)-[:CHILD_OF]->(b)