neo4j cypher查询运行速度很慢

时间:2016-06-03 13:16:39

标签: neo4j cypher

我使用neo4j存储应用程序数据,下面的图像描绘了图形结构

graph

每个圆圈都是一个节点,每个箭头描绘一个关系,关系类型如上所述。它还定义了many to many or one to many or one to one relationship for nodes

我想从图表中检索。

我想列出公司的所有职位,每个职位都有一系列用户,每个用户都会有一系列的反馈,如下所示

position ---> candidate1 interview round name (Telephonic) question1 answer1 and answer given by user1 question1 answer1 and answer given by user2 interview round name (HR Round) question1 answer1 and answer given by user1 question1 answer1 and answer given by user2 candidate2 interview round name (Telephonic) question1 answer1 and answer given by user1 question1 answer1 and answer given by user2 interview round name (HR Round) question1 answer1 and answer given by user1 question1 answer1 and answer given by user2 . . .

许多候选人不会进行面试,而不是那些候选人应该为空。

以下是我用来检索我需要的数据的查询。

MATCH (comp:Company {dId: "155dyv1wgT"})<-[:`POSITION_COMPANY`]-(pos: Position {status: 'OPEN'})-[:`POSITION_WORKFLOW`]->(:WorkFlow)-[:`WORKFLOW_CANDIDATE-WORKFLOW`]->(cw : CandidateWorkFlow)-[:`CANDIDATE-WORKFLOW_COMPANY-CANDIDATE`]->(cc : CompanyCandidate)

where ((not (has(cc.isSpam) or has(cc.isTrash))) OR (cc.isSpam=false and cc.isTrash=false)) and pos.positionType IN ['PUBLIC','DISCRETE'] with distinct comp, {dId: pos.dId, title: pos.title} as pos, cw, cc

OPTIONAL MATCH (cw)-[:`CANDIDATE_WORKFLOW_INTERVIEW`]->(inwrkflw: InterviewWorkFlow)-[:`INTERVIEW_ROUND`]->(intrnd: InterviewRound)-[:`INTERVIEW_ROUND_FEEDBACK`]->(ffform: FeedbackForm)-[:`FEEDBACK_QUESTION`]-(ffq: Question) 

OPTIONAL MATCH (inwrkflw)-[:`INTERVIEW_WORKFLOW_FEEDBACK`]-(ff:Feedback)

OPTIONAL MATCH (iwr : User)-[:`FEEDBACK_BY`]->(ff)-[:`FEEDBACK_ANSWER`]->(answer:Answer)-[:`QUESTION_ANSWER`]->(ffq) 

with collect({answer : answer.value, rating: answer.rating, question : ffq.qText, givenBy : iwr.fullName, type: ffq.questionType, givenOn: answer.lastModifiedDate}) as rnds, cc, pos, intrnd

with filter(rnd IN rnds WHERE rnd.type = 'COMMENTS') as comments, filter(rnd IN rnds WHERE rnd.type = 'LINEAR_GENERIC') as ratings, cc, pos, intrnd

with distinct collect({roundName: intrnd.name, ratings: ratings, comments: comments}) as rounds, cc, pos

return collect({cc: cc, rounds: rounds}) as data, pos.dId as posId, pos.title as posTitle

dId在每个节点上都是唯一的。

这个查询的问题是小数据集让我们说1000个候选人有10个位置它会运行正常。但对于大型数据集,返回结果需要很长时间。我甚至在neo4j控制台中等了5分钟才得到响应,但是在5分钟内没有响应。

该申请将不会有1000名候选人。候选人数量最多可达100000,最高可达每个公司100万美元。

我尝试了各种方法来优化此查询,但无法获得响应。

响应SLA应该在20秒内。

我的问题是

  1. 如何优化此查询以获得我想要的结果?
  2. 当前查询有什么问题?

1 个答案:

答案 0 :(得分:0)

首先我将您的数据库升级到2.3.3

您的模型由两个描述

的子图组成
  1. 招聘流程的元信息
  2. 一位候选人的具体反馈/答案
  3. 对于你的公司来说,两者都很大:

    1. 通过面试过程从公司到问题的235600条路径
    2. 83937从公司到问题通过具体答案
    3. 每个计算需要大约1秒

      如果您只是查询它们,那么您将乘以最终达到200亿条路径的数字。

      这需要永远计算

      我的解决方案是首先查询一个子图,然后将其放在一边(在聚合中),然后查询第二个子图

      首先,我通过匹配问题的答案(将其转换为ExpandInto操作)将它们组合在一起

      WHERE (answer)-[:QUESTION_ANSWER]->(ffq)

      这使得查询在大约15秒内完成。

      然后我将具体(答案)子图一步扩展到问题,并将它们集中在问题本身(ffq = ffq2

      这使总执行时间降至1.6秒。

      以下是最终查询:

      MATCH (comp:Company {dId: "155dyv1wgT"})<-[:`POSITION_COMPANY`]-(pos: Position {status: 'OPEN'})-[:`POSITION_WORKFLOW`]->(:WorkFlow)-[:`WORKFLOW_CANDIDATE-WORKFLOW`]->(cw : CandidateWorkFlow)-[:`CANDIDATE-WORKFLOW_COMPANY-CANDIDATE`]->(cc : CompanyCandidate)
      
      where ((not (has(cc.isSpam) or has(cc.isTrash))) OR (cc.isSpam=false and cc.isTrash=false)) and pos.positionType IN ['PUBLIC','DISCRETE'] with distinct comp, {dId: pos.dId, title: pos.title} as pos, cw, cc
      
      with distinct comp, {dId: pos.dId, title: pos.title} as pos, cw, cc
      
      MATCH (cw)-[:`CANDIDATE_WORKFLOW_INTERVIEW`]->(inwrkflw)
      
      MATCH (inwrkflw)-[:`INTERVIEW_ROUND`]->(intrnd)-[:`INTERVIEW_ROUND_FEEDBACK`]->(ffform)-[:`FEEDBACK_QUESTION`]-(ffq) 
      
      WITH comp,pos, cw, cc,inwrkflw, collect({round:intrnd,form:ffform,question:ffq}) as workflow_questions
      
      MATCH (inwrkflw)-[:`INTERVIEW_WORKFLOW_FEEDBACK`]-(ff:Feedback)
      MATCH (iwr : User)-[:`FEEDBACK_BY`]->(ff)-[:`FEEDBACK_ANSWER`]->(answer:Answer)-[:`QUESTION_ANSWER`]->(ffq2) 
      
      UNWIND workflow_questions as wq
      
      WITH comp,pos, cw, cc,inwrkflw, iwr,ff,answer, wq.round as intrnd, wq.form as ffform, wq.question as ffq
      
      WHERE ffq2 = ffq
      
      with collect({answer : answer.value, rating: answer.rating, question : ffq.qText, givenBy : iwr.fullName, type: ffq.questionType, givenOn: answer.lastModifiedDate}) as rnds, cc, pos, intrnd
      
      with filter(rnd IN rnds WHERE rnd.type = 'COMMENTS') as comments, filter(rnd IN rnds WHERE rnd.type = 'LINEAR_GENERIC') as ratings, cc, pos, intrnd
      
      with collect({roundName: intrnd.name, ratings: ratings, comments: comments}) as rounds, cc, pos
      
      return collect({cc: cc, rounds: rounds}) as data, pos.dId as posId, pos.title as posTitle;