Question

我正在努力找出拥有获得职业资格所需的所有必要技能的用户数量。用户可以拥有许多技能，我希望每个工作都返回所有合格用户。

这是我当前的查询：

  MATCH (:User)-[:has_skill]->(:Skill)<-[:requires]-(o:Occupation)
  WITH DISTINCT o
  MATCH (o)
  WITH o, SIZE((o)-[:requires]->()) AS occupation_skill_count
  MATCH (o)-[:requires]->(:Skill)<-[hs:has_skill]-(u:User)
  WITH o, u, occupation_skill_count, count(hs) AS user_skill_count
  WHERE occupation_skill_count = user_skill_count
  WITH o.title as occupation_title, count(u) as users_count
  RETURN occupation_title, users_count

但是，我担心我的查询效率不高，因为它超时（有超过60,000个职业，10,000个用户和2,500个技能）。我想知道是否有更好的方法来编写此查询。

我编写此查询的方法是，

匹配通过技能连接到用户的所有职业。
计算所有这些职业所需技能的数量。
通过技能匹配与这些职业相关的所有用户，其中用户对该职业的技能数量等于职业所需的所有技能数量。

这似乎适用于暂存环境，其中记录要少得多。然而，由于数据太多，它只会超时。有没有更好的方法来写这个？

Answer 1

对于性能问题，有助于显示查询的PROFILE计划。如果您可以展开计划的所有元素并将其粘贴到说明中，那么可以帮助确定可以改进查询的位置。

由于您正在为所有职业执行此操作，因此它是批处理的理想选择。但是，由于批处理将无法返回计数（它用于写入操作），我们可以使用它将计数写入：占用节点，这样我们就可以在计算完这些数字后快速查询这些数字。。此时，如果您想保留计算出的属性（可能包含计算时间的时间戳），或者只是报告它们并立即删除属性，则取决于您。

您需要APOC Procedures才能执行批处理操作。 apoc.periodic.iterate()将是您选择的程序（您可以将batchSize调整为最适合您的方法）。我会在线添加评论。

CALL apoc.periodic.iterate(
 // iterate in batches for all :Occupations
 "MATCH (o:Occupation) RETURN o",
 // for each occupation, get all skills in ascending order of skilled users
 "MATCH (o)-[:requires]->(s:Skill)
 WITH o, s, size((s)<-[:has_skill]-()) as skilledUserCount
 WHERE skilledUserCount <> 0
 ORDER BY skilledUserCount ASC
 WITH o, collect(s) as skills
 WITH o, head(skills) as first, tail(skills) as skills
 // get users with all the required skills
 // because of ordering, we start with the smallest set of skilled users
 MATCH (first)<-[:has_skill]-(u)
 WHERE ALL(skill in skills WHERE (skill)<-[:has_skill]-(u))
 // now set this count of users with all skills to the occupation
 WITH o, count(u) as skilledUsers
 SET o.skilledUsers = skilledUsers
 // uncomment next line to keep a timestamp of when this was last updated
 // SET o.skilledUsersUpdated = timestamp()
 ",
 {batchSize:1000, parallel:true, iterateList:true}) YIELD batches, total
 RETURN batches, total

一旦完成，所有职业都应该拥有熟练的用户数量，以便于查询：

MATCH (o:Occupation)
RETURN o.title as occupation_title, o.skilledUsers as users_count

Cypher / Neo4j：如何匹配与所有相关节点有关系的节点

1 个答案: