我试图根据用户在graph中的共同兴趣来比较用户 我知道为什么以下查询会产生重复对,但是在cypher中无法想到避免它的好方法。如果没有在密码中循环,有没有办法做到这一点?
neo4j-sh (?)$ start n=node(*) match p=n-[:LIKES]->item<-[:LIKES]-other where n <> other return n.name,other.name,collect(item.name) as common, count(*) as freq order by freq desc;
==> +-----------------------------------------------+
==> | n.name | other.name | common | freq |
==> +-----------------------------------------------+
==> | "u1" | "u2" | ["f1","f2","f3"] | 3 |
==> | "u2" | "u1" | ["f1","f2","f3"] | 3 |
==> | "u1" | "u3" | ["f1","f2"] | 2 |
==> | "u3" | "u2" | ["f1","f2"] | 2 |
==> | "u2" | "u3" | ["f1","f2"] | 2 |
==> | "u3" | "u1" | ["f1","f2"] | 2 |
==> | "u4" | "u3" | ["f1"] | 1 |
==> | "u4" | "u2" | ["f1"] | 1 |
==> | "u4" | "u1" | ["f1"] | 1 |
==> | "u2" | "u4" | ["f1"] | 1 |
==> | "u1" | "u4" | ["f1"] | 1 |
==> | "u3" | "u4" | ["f1"] | 1 |
==> +-----------------------------------------------+
答案 0 :(得分:10)
为了避免以a--b
和b--a
的形式出现重复项,您可以使用
WHERE ID(a) < ID(b)
进行上述查询
start n=node(*) match p=n-[:LIKES]->item<-[:LIKES]-other where ID(n) < ID(other) return n.name,other.name,collect(item.name) as common, count(*) as freq order by freq desc;
答案 1 :(得分:0)
好的,我看到你使用(*)作为起点,这意味着循环遍历整个图并使每个节点作为一个起始点。所以输出是不同的,不像你说的那样重复.. < / p>
+-----------------------------------------------+
| n.name | other.name | common | freq |
+-----------------------------------------------+
| "u2" | "u1" | ["f1","f2","f3"] | 3 |
不等于:
+-----------------------------------------------+
| n.name | other.name | common | freq |
+-----------------------------------------------+
| "u1" | "u2" | ["f1","f2","f3"] | 3 |
所以,我看到如果你尝试使用索引并设置一个起点,就不会有任何重复。
start n=node:someIndex(name='C') match p=n-[:LIKES]->item<-[:LIKES]-other where n <> other return n.name,other.name,collect(item.name) as common, count(*) as freq order by freq desc;