我有一个包含以下列的csv文件 -
Child_Object_ID;
Child_Object_Name;
Child_Object_Type;
Parent_Object_ID;
Parent_Object_Name;
Parent_Object_Type
顾名思义,包含(Child_Object_ID Child_Object_Name and Child_Object_Type
)的节点是(Parent_Object_ID Parent_Object_Name and Parent_Object_Type
)的子节点。这些父节点可以是其他父节点的子节点。
此CSV文件包含110万条记录。我面临的问题是,在加载100K记录之后,我没有看到加载过程中的任何增量。但是加载过程一直在运行,但我没有看到任何进一步的节点或关系正在建立。
我正在使用以下Cypher查询将数据加载到Neo4j Windows版本中 -
CREATE INDEX ON :Object(Object_ID)
CREATE INDEX ON :Object(Object_ID, Object_Name, Object_Type)
CREATE INDEX ON :Object(Object_Type)
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///file1.csv" AS csvLine
MERGE (object1:Object {Object_ID:csvLine.CHILD_OBJECT_ID, Object_Name:csvLine.CHILD_OBJECT_NAME, Object_Type:csvLine.CHILD_OBJECT_TYPE})
MERGE (object2:Object {Object_ID:csvLine.PARENT_OBJECT_ID, Object_Name:csvLine.PARENT_OBJECT_NAME, Object_Type:csvLine.PARENT_OBJECT_TYPE})
MERGE (object1)-[:Child_Of]->(object2)
答案 0 :(得分:0)
您的加载查询存在的问题是计划中存在Eager
操作,这将阻止PERIODIC COMMIT批处理(您应该在查询输入框中看到此查询的警告,请查看警告消息)。
如果没有批处理,您的导入可能会遇到内存问题。
要避免急切操作,请尝试仅使用单个变量合并所有节点的导入查询。完成之后,运行一个使用MATCH同时为子节点和父节点(将与现有节点匹配)的负载,然后合并关系。
这是一个article(较旧,但仍适用)避免急切操作。
答案 1 :(得分:0)
以下是更新后的Cypher查询(基于InverseFalcon建议重新编写):
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///file1.csv" AS csvLine
MERGE (object1:Object {Object_ID:csvLine.CHILD_OBJECT_ID, Object_Name:COALESCE(csvLine.CHILD_OBJECT_NAME, 'NA'), Object_Type:csvLine.CHILD_OBJECT_TYPE, Folder:COALESCE(csvLine.CHILD_FOLDER, 'NA')})
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///file1.csv" AS csvLine
MERGE (object2:Object {Object_ID:csvLine.PARENT_OBJECT_ID, Object_Name:COALESCE(csvLine.PARENT_OBJECT_NAME, 'NA'), Object_Type:csvLine.PARENT_OBJECT_TYPE, Folder:COALESCE(csvLine.PARENT_FOLDER, 'NA')})
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///file1.csv" AS csvLine
MATCH (a:Object {Object_ID:csvLine.CHILD_OBJECT_ID})
MATCH (b:Object {Object_ID:csvLine.PARENT_OBJECT_ID})
MERGE (a)-[:CHILD_OF]->(b)