计算两部电影之间的相似性指数(Neo4j,cypher)

时间:2014-04-26 02:38:58

标签: neo4j cypher

对此进行扩展 Multiple relationships in Match Cypher

   MATCH (m:Movie { title: "The Matrix" })-[h1:HAS_TAG]->(t:Tag),
     (t)<-[h2:HAS_TAG]-(sm:Movie),
     (m)-[h:HAS_TAG]->(t0:Tag),
    (sm)-[H:HAS_TAG]->(t1:Tag)
   WHERE m <> sm
   WITH DISTINCT sm, h
   RETURN sm, collect(h.weight)

我在找到同时获得h1,h2,H,h的不同值时遇到了麻烦。 我想计算任何两部电影之间的相似性指数,它们将取决于h1,h2,h,H(h1.h2/|h||H|

 MATCH (m:Movie { title: "The Matrix" })-[h1:HAS_TAG]->(t:Tag),
  (t)<-[h2:HAS_TAG]-(sm:Movie),
  (m)-[h:HAS_TAG]->(t0:Tag),
  (sm)-[H:HAS_TAG]->(t1:Tag) 
 WHERE m <> sm 
 WITH sum(h1.weight*h2.weight) as num, sm, H, m, h
 WITH DISTINCT m, sqrt(sum(h.weight^2)) as den1, sm, H, num
 WITH DISTINCT sm, sqrt(sum(H.weight^2)) as den2, den1, num 
 RETURN num/(den1*den2)

这一切都搞砸了..但我无法找到解决这个问题的正确方法。请帮忙。

2 个答案:

答案 0 :(得分:2)

这有效并给出正确答案......

  MATCH (m:Movie { title: "The Matrix" })-[h1:HAS_TAG]->(t:Tag)<-[h2:HAS_TAG]-(sm)
  WHERE m <> sm
  WITH SUM(h1.weight * h2.weight) AS num,
     SQRT(REDUCE(xDot = 0.0, a IN COLLECT(h1)| xDot + a.weight^2)) AS xLength,
     SQRT(REDUCE(yDot = 0.0, b IN COLLECT(h2)| yDot + b.weight^2)) AS yLength, m, sm
  RETURN num, xLength, yLength

答案 1 :(得分:0)

看看我使用Neo4j控制台生成的这个例子:

http://console.neo4j.org/?id=aq6cb3

查询应为:

MATCH (m:Movie { title: "The Matrix" })-[h1:HAS_TAG]->(t:Tag),
    (t)<-[h2:HAS_TAG]-(sm:Movie),
    (m)-[h:HAS_TAG]->(t0:Tag),
    (sm)-[H:HAS_TAG]->(t1:Tag)
WHERE m <> sm
WITH m, sm, 
    collect(DISTINCT h) AS h, 
    collect(DISTINCT H) AS H, 
    sum(h1.weight*h2.weight) AS num
WITH m, sm, num,
    sqrt(reduce(s = 0.0, x IN h | s +(x.weight^2))) AS den1, 
    sqrt(reduce(s = 0.0, x IN H | s +(x.weight^2))) AS den2
RETURN m.title, sm.title, (num/(den1*den2)) AS similarity

结果如下:

+---------------------------------------------------------------+
| m.title      | sm.title                  | similarity         |
+---------------------------------------------------------------+
| "The Matrix" | "The Matrix: Revolutions" | 3.859767091086958  |
| "The Matrix" | "The Matrix: Reloaded"    | 1.4380667053087486 |
+---------------------------------------------------------------+

我使用reduce函数来聚合来自不同集合的关系值,并执行相似性指数计算。