在neo4j中创建关系需要太长时间

时间:2017-09-18 19:51:40

标签: neo4j cypher

我有一个包含以下标题的CSV文件:

jobid,prev_jobid,next_jobid,change_number,change_datetime,change_description,username,email,job_submittime,job_starttime,job_endtime,job_status
"27555","27552","0","134180","2017-09-07 17:39:06","Testing a new methodology",john,john@myco.com,"2017-09-07 17:39:09","2017-09-07 18:11:10","success"
"27552","27549","27555","134178","2017-09-07 17:32:06","bug fix",emma,emma@myco.co,"2017-09-07 17:29:09","2017-09-07 17:11:10","success"
..
..

我已经加载了CSV并创建了3种类型的节点:

LOAD CSV WITH HEADERS FROM "file:///wbdqueue.csv" AS bud
CREATE (j:job{id:bud.jobid,pid:bud.prev_jobid,nid:bud.next_jobid,
add_time:bud.job_submittime,start_time:bud.job_starttime,end_time:bud.job_endtime,status:bud.job_status})
CREATE (c:cl{clnum:bud.change_number,time:bud.change_datetime,desc:bud.change_description})
CREATE (u:user{user:bud.username,email:bud.email})

然后我尝试创建这样的关系:

LOAD CSV WITH HEADERS FROM "file:///wbdqueue.csv" AS node
MATCH (c:cl),(u:user) WHERE c.clnum=node.change_number AND u.user=node.username
CREATE (u)-[:SUBMITTED]->(c)

首先,在浏览器中有一个警告,它构建了所有断开连接模式的笛卡尔积,并且可能需要大量内存/时间,并建议添加可选的MATCH。 其次我给了它好几个小时(> 3天),但这并没有创造任何关系。

我的查询错误是什么?

我最终想要实现的是这样的(如果你在你的Ne04j控制台中运行以下内容,你应该得到我也附加的示例图。我在这个可视化的例子中减少了保持这个简单的属性。):

CREATE 
  (`0` :user ) ,
  (`1` :job ) ,
  (`2` :change_number ) ,
  (`3` :user ) ,
  (`4` :change_number ) ,
  (`5` :job ) ,
  (`0`)-[:`SUBMITTED`]->(`2`),
  (`2`)-[:`CAUSED`]->(`1`),
  (`3`)-[:`SUBMITTED`]->(`4`),
  (`2`)-[:`NEXT_CHECKIN`]->(`4`),
  (`4`)-[:`CAUSED`]->(`5`),
  (`1`)-[:`NEXT_JOB`]->(`5`),
  (`5`)-[:`PREVIOUS_JOB`]->(`1`)

cypher_example_from_arrowsTool_screenshot

谢谢

1 个答案:

答案 0 :(得分:2)

有关笛卡尔产品的警告是因为您matching multiple disconnected patternsMATCH (c:cl),(u:user))。

我相信您可以使用MERGE代替MATCH & WHERE,后跟CREATE。另外,添加USING PERIODIC COMMIT以减少事务状态的内存开销:

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///wbdqueue.csv" AS node
MERGE (:cl {clnum : node.change_number})-[:SUBMITTED]->(:user {user : node.username})