针对Cypher子查询运行计算

时间:2017-09-14 18:35:39

标签: neo4j cypher graph-databases

假设我有以下图表设置:

CREATE (john:Person {name: 'John Doe'}), (jane:Person {name: 'Jane Doe'}), (bob:Person {name: 'Bob Doe'})
CREATE (reading:Hobby {name: 'Reading'}), (sports:Hobby {name: 'Sports'}), (music:Hobby {name: 'Music'})
MERGE (john)-[:LIKES {intensity: 25}]->(reading)
MERGE (john)-[:LIKES {intensity: 70}]->(sports)
MERGE (john)-[:DISLIKES {intensity: 15}]->(music)
MERGE (jane)-[:LIKES {intensity: 50}]->(reading)
MERGE (jane)-[:DISLIKES {intensity: 40}]->(sports)
MERGE (jane)-[:LIKES {intensity: 20}]->(music)
MERGE (bob)-[:DISLIKES {intensity: 35}]->(reading)
MERGE (bob)-[:LIKES {intensity: 50}]->(sports)
MERGE (bob)-[:LIKES {intensity: 25}]->(music)

每个人可能会以某种任意强度喜欢或不喜欢给定的爱好。

计算每个人的共同激情" (相互喜欢或不喜欢)对于任何给定的爱好,我可以运行以下内容:

MATCH (a:Person)-[al]->(h:Hobby)<-[bl]-(b:Person)
WHERE ID(a) < ID(b) AND TYPE(al) = TYPE(bl)
RETURN a.name, b.name, TYPE(al), h.name, (al.intensity + bl.intensity) / 2 AS passion

计算每个人&#34;鄙视&#34;对于一个给定的爱好,我可以运行逆向:

MATCH (a:Person)-[al]->(h:Hobby)<-[bl]-(b:Person)
WHERE ID(a) < ID(b) AND TYPE(al) <> TYPE(bl)
RETURN a.name, b.name, h.name, (al.intensity + bl.intensity) / 2 AS disdain

这两项计算都以我期望的方式完全返回信息,但我在确定&#34;激情&#34;之间的区别时遇到了一些麻烦。并且&#34;不屑于&#34;在一个查询中计算最终的&#34;兼容性&#34;评级并按降序对结果进行排序。

我曾经尝试过这样的事情:

MATCH (a:Person)-[al]->(h:Hobby)<-[bl]-(b:Person)
WHERE ID(a) < ID(b) AND TYPE(al) <> TYPE(bl)
WITH (al.intensity + bl.intensity) / 2 AS disdain
MATCH (a:Person)-[al]->(h:Hobby)<-[bl]-(b:Person)
WHERE ID(a) < ID(b) AND TYPE(al) = TYPE(bl)
WITH a, b, h, disdain, (al.intensity + bl.intensity) / 2 AS passion
RETURN a.name, b.name, h.name, passion, disdain, (passion - disdain) AS compatibility
ORDER BY compatibility DESC

但由于我对Neo4j和Cypher查询缺乏经验,我最终得到的结果非常不正确。

我觉得我需要使用COLLECTUNWIND的组合才能达到我想要的效果,但我不确定如何接近它,以及我是否在正确的轨道上。

作为旁注,我知道我可以通过将关系限制为LIKES并使用有符号整数来强度来实现更简单的结果(即:负LIKE可以表示DISLIKE),但我更愿意保留它们如果可能就分开。

有什么想法吗?

修改

使用stdob给我的答案,我能够投入一些汇总,最后我得到了以下内容:

MATCH (a:Person)-[al]->(h:Hobby)<-[bl]-(b:Person)
WHERE ID(a) < ID(b)
WITH a, al, h, bl, b, (al.intensity + bl.intensity)/2 AS value
WITH a, al, h, bl, b, value,
    CASE WHEN TYPE(al) =  TYPE(bl) THEN value ELSE 0 END AS mutual,
    CASE WHEN TYPE(al) <> TYPE(bl) THEN value ELSE 0 END AS separate
RETURN DISTINCT a.name, SUM(mutual) AS passion, SUM(separate) AS disdain, (SUM(mutual) - SUM(separate)) AS compatibility, b.name
ORDER BY compatibility DESC

输出更加清晰,正是我所希望的:

NAME A      PASSION   DISDAIN   COMPATIBILITY  NAME B
"John Doe"  60        50        10             "Bob Doe"
"John Doe"  37        72        -35            "Jane Doe"
"Jane Doe"  22        87        -65            "Bob Doe"

3 个答案:

答案 0 :(得分:2)

我认为你需要这样的东西:

MATCH (a:Person)-[al]->(h:Hobby)<-[bl]-(b:Person)
WHERE ID(a) < ID(b)
WITH a, al, h, bl, b, (al.intensity + bl.intensity)/2 AS value
WITH a, al, h, bl, b, value,
     CASE WHEN TYPE(al) =  TYPE(bl) THEN value ELSE 0 END AS passion,
     CASE WHEN TYPE(al) <> TYPE(bl) THEN value ELSE 0 END AS disdain
RETURN a.name, b.name, h.name, 
       passion, disdain, 
       ABS(passion - disdain)/2.0 AS compatibility 
ORDER BY compatibility DESC

答案 1 :(得分:1)

您可以使用UNION来合并两个查询的结果:

WHERE ID(a) < ID(b) AND TYPE(al) = TYPE(bl)
RETURN a.name, b.name, "passion" AS intent, h.name, (al.intensity + bl.intensity) / 2 AS metric
UNION
MATCH (a:Person)-[al]->(h:Hobby)<-[bl]-(b:Person)
WHERE ID(a) < ID(b) AND TYPE(al) <> TYPE(bl)
RETURN a.name, b.name, "disdain" AS intent, h.name, (al.intensity + bl.intensity) / 2 AS metric

答案 2 :(得分:1)

这是我的密码会话和你提出的问题的解决方案。

我的方法假设缺乏LIKE和DISLIKE关系表示对该Hobby的强度为零。我也使DISLIKE强度为负。

注意:它使用APOC功能,因此您需要安装它。

见这里:https://github.com/neo4j-contrib/neo4j-apoc-procedures

neo4j> // Step 1: Get a resultset of hobbies that we care about

        MATCH (h:Hobby)
         WITH h.name AS hobby
        ORDER BY hobby
       RETURN hobby;

+-----------+
| hobby     |
+-----------+
| "Music"   |
| "Reading" |
| "Sports"  |
+-----------+


neo4j> // Step 2: Convert rows of hobbies into a collection of hobbies (row2col)

        MATCH (h:Hobby)
         WITH h.name AS hobby
        ORDER BY hobby
         WITH COLLECT(hobby) AS hobbies
       RETURN hobbies;

+--------------------------------+
| hobbies                        |
+--------------------------------+
| ["Music", "Reading", "Sports"] |
+--------------------------------+


neo4j> // Step 3: With hobbies as "global" state, match with every :Person node

        MATCH (h:Hobby)
         WITH h.name AS hobby
        ORDER BY hobby
         WITH COLLECT(hobby) AS hobbies
        MATCH (person:Person)
       RETURN hobbies, person;

+---------------------------------------------------------------+
| hobbies                        | person                       |
+---------------------------------------------------------------+
| ["Music", "Reading", "Sports"] | (:Person {name: "John Doe"}) |
| ["Music", "Reading", "Sports"] | (:Person {name: "Jane Doe"}) |
| ["Music", "Reading", "Sports"] | (:Person {name: "Bob Doe"})  |
+---------------------------------------------------------------+


neo4j> // Step 4: Gather likes and dislikes into maps

        MATCH (h:Hobby)
         WITH h.name AS hobby
        ORDER BY hobby
         WITH COLLECT(hobby) AS hobbies
        MATCH (person:Person)
       OPTIONAL
        MATCH (person)-[LIKES:LIKES]->(h:Hobby)
         WITH hobbies, person, apoc.map.fromLists(COLLECT(h.name), COLLECT(LIKES.intensity)) AS likes
       OPTIONAL
        MATCH (person)-[DISLIKES:DISLIKES]->(h:Hobby)
       RETURN hobbies, person, likes,
              apoc.map.fromLists(COLLECT(h.name), COLLECT(DISLIKES.intensity)) AS dislikes;

+-----------------------------------------------------------------------------------------------------------+
| hobbies                        | person                       | likes                     | dislikes      |
+-----------------------------------------------------------------------------------------------------------+
| ["Music", "Reading", "Sports"] | (:Person {name: "Jane Doe"}) | {Music: 20, Reading: 50}  | {Sports: 40}  |
| ["Music", "Reading", "Sports"] | (:Person {name: "John Doe"}) | {Reading: 25, Sports: 70} | {Music: 15}   |
| ["Music", "Reading", "Sports"] | (:Person {name: "Bob Doe"})  | {Music: 25, Sports: 50}   | {Reading: 35} |
+-----------------------------------------------------------------------------------------------------------+


neo4j> // Step 5: Turn maps into collections (vectors), using hobbies list

        MATCH (h:Hobby)
         WITH h.name AS hobby
        ORDER BY hobby
         WITH COLLECT(hobby) AS hobbies
        MATCH (person:Person)
       OPTIONAL
        MATCH (person)-[LIKES:LIKES]->(h:Hobby)
         WITH hobbies, person, apoc.map.fromLists(COLLECT(h.name), COLLECT(LIKES.intensity)) AS likes
       OPTIONAL
        MATCH (person)-[DISLIKES:DISLIKES]->(h:Hobby)
         WITH hobbies, person, likes,
              apoc.map.fromLists(COLLECT(h.name), COLLECT(DISLIKES.intensity)) AS dislikes
       RETURN person,
              [x IN hobbies | COALESCE(likes[x], 0)] AS likes,
              [x IN hobbies | COALESCE(-dislikes[x], 0)] AS dislikes;

+----------------------------------------------------------+
| person                       | likes       | dislikes    |
+----------------------------------------------------------+
| (:Person {name: "Jane Doe"}) | [20, 50, 0] | [0, 0, -40] |
| (:Person {name: "John Doe"}) | [0, 25, 70] | [-15, 0, 0] |
| (:Person {name: "Bob Doe"})  | [25, 0, 50] | [0, -35, 0] |
+----------------------------------------------------------+


neo4j> // Step 6: Map each person against each other

        MATCH (h:Hobby)
         WITH h.name AS hobby
        ORDER BY hobby
         WITH COLLECT(hobby) AS hobbies
        MATCH (person:Person)
       OPTIONAL
        MATCH (person)-[LIKES:LIKES]->(h:Hobby)
         WITH hobbies, person, apoc.map.fromLists(COLLECT(h.name), COLLECT(LIKES.intensity)) AS likes
       OPTIONAL
        MATCH (person)-[DISLIKES:DISLIKES]->(h:Hobby)
         WITH hobbies, person, likes,
              apoc.map.fromLists(COLLECT(h.name), COLLECT(DISLIKES.intensity)) AS dislikes
         WITH person,
              [x IN hobbies | COALESCE(likes[x], 0)] AS likes,
              [x IN hobbies | COALESCE(-dislikes[x], 0)] AS dislikes
         WITH COLLECT({person:person, likes:likes, dislikes:dislikes}) AS rows
       UNWIND rows AS left
       UNWIND rows AS right
         WITH left, right
        WHERE ID(left.person) < ID(right.person)
       RETURN left, right;

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| left                                                                              | right                                                                             |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {person: (:Person {name: "Jane Doe"}), dislikes: [0, 0, -40], likes: [20, 50, 0]} | {person: (:Person {name: "Bob Doe"}), dislikes: [0, -35, 0], likes: [25, 0, 50]}  |
| {person: (:Person {name: "John Doe"}), dislikes: [-15, 0, 0], likes: [0, 25, 70]} | {person: (:Person {name: "Jane Doe"}), dislikes: [0, 0, -40], likes: [20, 50, 0]} |
| {person: (:Person {name: "John Doe"}), dislikes: [-15, 0, 0], likes: [0, 25, 70]} | {person: (:Person {name: "Bob Doe"}), dislikes: [0, -35, 0], likes: [25, 0, 50]}  |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+


neo4j> // Step 7: Calculate simple averages

        MATCH (h:Hobby)
         WITH h.name AS hobby
        ORDER BY hobby
         WITH COLLECT(hobby) AS hobbies
        MATCH (person:Person)
       OPTIONAL
        MATCH (person)-[LIKES:LIKES]->(h:Hobby)
         WITH hobbies, person, apoc.map.fromLists(COLLECT(h.name), COLLECT(LIKES.intensity)) AS likes
       OPTIONAL
        MATCH (person)-[DISLIKES:DISLIKES]->(h:Hobby)
         WITH hobbies, person, likes,
              apoc.map.fromLists(COLLECT(h.name), COLLECT(DISLIKES.intensity)) AS dislikes
         WITH person,
              [x IN hobbies | COALESCE(likes[x], 0)] AS likes,
              [x IN hobbies | COALESCE(-dislikes[x], 0)] AS dislikes
         WITH COLLECT({person: person, likes:likes, dislikes:dislikes}) AS coll
       UNWIND coll AS left
       UNWIND coll AS right
         WITH left, right
        WHERE ID(left.person) < ID(right.person)
       RETURN left.person.name,
              right.person.name,
              left.likes,
              right.likes,
              EXTRACT(x IN apoc.coll.zip(left.likes, right.likes)       | (x[0] + x[1]) / 2) AS avg_like,
              left.dislikes,
              right.dislikes,
              EXTRACT(x IN apoc.coll.zip(left.dislikes, right.dislikes) | (x[0] + x[1]) / 2) AS avg_dislike;

+----------------------------------------------------------------------------------------------------------------------------------+
| left.person.name | right.person.name | left.likes  | right.likes | avg_like     | left.dislikes | right.dislikes | avg_dislike   |
+----------------------------------------------------------------------------------------------------------------------------------+
| "Jane Doe"       | "Bob Doe"         | [20, 50, 0] | [25, 0, 50] | [22, 25, 25] | [0, 0, -40]   | [0, -35, 0]    | [0, -17, -20] |
| "John Doe"       | "Jane Doe"        | [0, 25, 70] | [20, 50, 0] | [10, 37, 35] | [-15, 0, 0]   | [0, 0, -40]    | [-7, 0, -20]  |
| "John Doe"       | "Bob Doe"         | [0, 25, 70] | [25, 0, 50] | [12, 12, 60] | [-15, 0, 0]   | [0, -35, 0]    | [-7, -17, 0]  |
+----------------------------------------------------------------------------------------------------------------------------------+

neo4j> // Step 8: Try apoc.algo.euclideanSimilarity()

        MATCH (h:Hobby)
         WITH h.name AS hobby
        ORDER BY hobby
         WITH COLLECT(hobby) AS hobbies
        MATCH (person:Person)
       OPTIONAL
        MATCH (person)-[LIKES:LIKES]->(h:Hobby)
         WITH hobbies, person, apoc.map.fromLists(COLLECT(h.name), COLLECT(LIKES.intensity)) AS likes
       OPTIONAL
        MATCH (person)-[DISLIKES:DISLIKES]->(h:Hobby)
         WITH hobbies, person, likes,
              apoc.map.fromLists(COLLECT(h.name), COLLECT(DISLIKES.intensity)) AS dislikes
         WITH person,
              [x IN hobbies | COALESCE(likes[x], 0)] AS likes,
              [x IN hobbies | COALESCE(-dislikes[x], 0)] AS dislikes
         WITH COLLECT({person: person, likes:likes, dislikes:dislikes}) AS coll
       UNWIND coll AS left
       UNWIND coll AS right
         WITH left, right
        WHERE ID(left.person) < ID(right.person)
       RETURN left.person.name,
              right.person.name,
              EXTRACT(x IN apoc.coll.zip(left.likes, right.likes)       | (x[0] + x[1]) / 2) AS avg_like,
              EXTRACT(x IN apoc.coll.zip(left.dislikes, right.dislikes) | (x[0] + x[1]) / 2) AS avg_dislike,
              apoc.algo.euclideanSimilarity(left.likes, right.likes) AS euclidean_like,
              apoc.algo.euclideanSimilarity(left.dislikes, right.dislikes) AS euclidean_dislike;
+-------------------------------------------------------------------------------------------------------------------+
| left.person.name | right.person.name | avg_like     | avg_dislike   | euclidean_like       | euclidean_dislike    |
+-------------------------------------------------------------------------------------------------------------------+
| "John Doe"       | "Jane Doe"        | [10, 37, 35] | [-7, 0, -20]  | 0.012824784198464426 | 0.02287281728431341  |
| "John Doe"       | "Bob Doe"         | [12, 12, 60] | [-7, -17, 0]  | 0.024026799286343117 | 0.025589279178274353 |
| "Jane Doe"       | "Bob Doe"         | [22, 25, 25] | [0, -17, -20] | 0.013910675635706434 | 0.018466972048042936 |
+-------------------------------------------------------------------------------------------------------------------+


neo4j> // Step 9: Save our similarity calculations (yay, new relationships!)

        MATCH (h:Hobby)
         WITH h.name AS hobby
        ORDER BY hobby
         WITH COLLECT(hobby) AS hobbies
        MATCH (person:Person)
       OPTIONAL
        MATCH (person)-[LIKES:LIKES]->(h:Hobby)
         WITH hobbies, person, apoc.map.fromLists(COLLECT(h.name), COLLECT(LIKES.intensity)) AS likes
       OPTIONAL
        MATCH (person)-[DISLIKES:DISLIKES]->(h:Hobby)
         WITH hobbies, person, likes,
              apoc.map.fromLists(COLLECT(h.name), COLLECT(DISLIKES.intensity)) AS dislikes
         WITH person,
              [x IN hobbies | COALESCE(likes[x], 0)] AS likes,
              [x IN hobbies | COALESCE(-dislikes[x], 0)] AS dislikes
         WITH COLLECT({person: person, likes:likes, dislikes:dislikes}) AS coll
       UNWIND coll AS left
       UNWIND coll AS right
         WITH left, right
        WHERE ID(left.person) < ID(right.person)
         WITH left.person AS person,
              right.person AS other,
              EXTRACT(x IN apoc.coll.zip(left.likes, right.likes)       | (x[0] + x[1]) / 2) AS avg_like,
              EXTRACT(x IN apoc.coll.zip(left.dislikes, right.dislikes) | (x[0] + x[1]) / 2) AS avg_dislike,
              apoc.algo.euclideanSimilarity(left.likes, right.likes) AS euclidean_like,
              apoc.algo.euclideanSimilarity(left.dislikes, right.dislikes) AS euclidean_dislike
        MERGE (person)-[LIKE:LIKE_SIMILARITY]->(other)
          SET LIKE.euclidean = euclidean_like,
              LIKE.avg = avg_like
        MERGE (person)-[DISLIKE:DISLIKE_SIMILARITY]->(other)
          SET DISLIKE.euclidean = euclidean_dislike,
              DISLIKE.avg = avg_dislike
       RETURN person.name, other.name, LIKE, DISLIKE;

+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| person.name | other.name | LIKE                                                                    | DISLIKE                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "Jane Doe"  | "Bob Doe"  | [:LIKE_SIMILARITY {euclidean: 0.013910675635706434, avg: [22, 25, 25]}] | [:DISLIKE_SIMILARITY {euclidean: 0.018466972048042936, avg: [0, -17, -20]}] |
| "John Doe"  | "Jane Doe" | [:LIKE_SIMILARITY {euclidean: 0.012824784198464426, avg: [10, 37, 35]}] | [:DISLIKE_SIMILARITY {euclidean: 0.02287281728431341, avg: [-7, 0, -20]}]   |
| "John Doe"  | "Bob Doe"  | [:LIKE_SIMILARITY {euclidean: 0.024026799286343117, avg: [12, 12, 60]}] | [:DISLIKE_SIMILARITY {euclidean: 0.025589279178274353, avg: [-7, -17, 0]}]  |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

注意:我不确定这是否是针对您的用例的良好相似性度量,但这至少证明了使用cypher + apoc可能进行的一些数据转换。