我使用neo4j存储应用程序数据,下面的图像描绘了图形结构
每个圆圈都是一个节点,每个箭头描绘一个关系,关系类型如上所述。它还定义了many to many or one to many or one to one relationship for nodes
。
我想从图表中检索。
我想列出公司的所有职位,每个职位都有一系列用户,每个用户都会有一系列的反馈,如下所示
position ---> candidate1
interview round name (Telephonic)
question1
answer1
and answer given by user1
question1
answer1
and answer given by user2
interview round name (HR Round)
question1
answer1
and answer given by user1
question1
answer1
and answer given by user2
candidate2
interview round name (Telephonic)
question1
answer1
and answer given by user1
question1
answer1
and answer given by user2
interview round name (HR Round)
question1
answer1
and answer given by user1
question1
answer1
and answer given by user2
.
.
.
许多候选人不会进行面试,而不是那些候选人应该为空。
以下是我用来检索我需要的数据的查询。
MATCH (comp:Company {dId: "155dyv1wgT"})<-[:`POSITION_COMPANY`]-(pos: Position {status: 'OPEN'})-[:`POSITION_WORKFLOW`]->(:WorkFlow)-[:`WORKFLOW_CANDIDATE-WORKFLOW`]->(cw : CandidateWorkFlow)-[:`CANDIDATE-WORKFLOW_COMPANY-CANDIDATE`]->(cc : CompanyCandidate)
where ((not (has(cc.isSpam) or has(cc.isTrash))) OR (cc.isSpam=false and cc.isTrash=false)) and pos.positionType IN ['PUBLIC','DISCRETE'] with distinct comp, {dId: pos.dId, title: pos.title} as pos, cw, cc
OPTIONAL MATCH (cw)-[:`CANDIDATE_WORKFLOW_INTERVIEW`]->(inwrkflw: InterviewWorkFlow)-[:`INTERVIEW_ROUND`]->(intrnd: InterviewRound)-[:`INTERVIEW_ROUND_FEEDBACK`]->(ffform: FeedbackForm)-[:`FEEDBACK_QUESTION`]-(ffq: Question)
OPTIONAL MATCH (inwrkflw)-[:`INTERVIEW_WORKFLOW_FEEDBACK`]-(ff:Feedback)
OPTIONAL MATCH (iwr : User)-[:`FEEDBACK_BY`]->(ff)-[:`FEEDBACK_ANSWER`]->(answer:Answer)-[:`QUESTION_ANSWER`]->(ffq)
with collect({answer : answer.value, rating: answer.rating, question : ffq.qText, givenBy : iwr.fullName, type: ffq.questionType, givenOn: answer.lastModifiedDate}) as rnds, cc, pos, intrnd
with filter(rnd IN rnds WHERE rnd.type = 'COMMENTS') as comments, filter(rnd IN rnds WHERE rnd.type = 'LINEAR_GENERIC') as ratings, cc, pos, intrnd
with distinct collect({roundName: intrnd.name, ratings: ratings, comments: comments}) as rounds, cc, pos
return collect({cc: cc, rounds: rounds}) as data, pos.dId as posId, pos.title as posTitle
dId
在每个节点上都是唯一的。
这个查询的问题是小数据集让我们说1000个候选人有10个位置它会运行正常。但对于大型数据集,返回结果需要很长时间。我甚至在neo4j控制台中等了5分钟才得到响应,但是在5分钟内没有响应。
该申请将不会有1000名候选人。候选人数量最多可达100000
,最高可达每个公司100万美元。
我尝试了各种方法来优化此查询,但无法获得响应。
响应SLA应该在20秒内。
我的问题是
答案 0 :(得分:0)
首先我将您的数据库升级到2.3.3
您的模型由两个描述
的子图组成对于你的公司来说,两者都很大:
每个计算需要大约1秒
如果您只是查询它们,那么您将乘以最终达到200亿条路径的数字。
这需要永远计算
我的解决方案是首先查询一个子图,然后将其放在一边(在聚合中),然后查询第二个子图
首先,我通过匹配问题的答案(将其转换为ExpandInto操作)将它们组合在一起
WHERE (answer)-[:QUESTION_ANSWER]->(ffq)
这使得查询在大约15秒内完成。
然后我将具体(答案)子图一步扩展到问题,并将它们集中在问题本身(ffq = ffq2
)
这使总执行时间降至1.6秒。
以下是最终查询:
MATCH (comp:Company {dId: "155dyv1wgT"})<-[:`POSITION_COMPANY`]-(pos: Position {status: 'OPEN'})-[:`POSITION_WORKFLOW`]->(:WorkFlow)-[:`WORKFLOW_CANDIDATE-WORKFLOW`]->(cw : CandidateWorkFlow)-[:`CANDIDATE-WORKFLOW_COMPANY-CANDIDATE`]->(cc : CompanyCandidate)
where ((not (has(cc.isSpam) or has(cc.isTrash))) OR (cc.isSpam=false and cc.isTrash=false)) and pos.positionType IN ['PUBLIC','DISCRETE'] with distinct comp, {dId: pos.dId, title: pos.title} as pos, cw, cc
with distinct comp, {dId: pos.dId, title: pos.title} as pos, cw, cc
MATCH (cw)-[:`CANDIDATE_WORKFLOW_INTERVIEW`]->(inwrkflw)
MATCH (inwrkflw)-[:`INTERVIEW_ROUND`]->(intrnd)-[:`INTERVIEW_ROUND_FEEDBACK`]->(ffform)-[:`FEEDBACK_QUESTION`]-(ffq)
WITH comp,pos, cw, cc,inwrkflw, collect({round:intrnd,form:ffform,question:ffq}) as workflow_questions
MATCH (inwrkflw)-[:`INTERVIEW_WORKFLOW_FEEDBACK`]-(ff:Feedback)
MATCH (iwr : User)-[:`FEEDBACK_BY`]->(ff)-[:`FEEDBACK_ANSWER`]->(answer:Answer)-[:`QUESTION_ANSWER`]->(ffq2)
UNWIND workflow_questions as wq
WITH comp,pos, cw, cc,inwrkflw, iwr,ff,answer, wq.round as intrnd, wq.form as ffform, wq.question as ffq
WHERE ffq2 = ffq
with collect({answer : answer.value, rating: answer.rating, question : ffq.qText, givenBy : iwr.fullName, type: ffq.questionType, givenOn: answer.lastModifiedDate}) as rnds, cc, pos, intrnd
with filter(rnd IN rnds WHERE rnd.type = 'COMMENTS') as comments, filter(rnd IN rnds WHERE rnd.type = 'LINEAR_GENERIC') as ratings, cc, pos, intrnd
with collect({roundName: intrnd.name, ratings: ratings, comments: comments}) as rounds, cc, pos
return collect({cc: cc, rounds: rounds}) as data, pos.dId as posId, pos.title as posTitle;