找到顶级朋友"有共同兴趣

时间:2017-01-04 04:16:04

标签: neo4j cypher

我为25个研讨会中的每一个创建了一个节点,并为70个客户端中的每个创建了一个节点。

研讨会每月多次发生,没有特别的顺序,每个研讨会一次只能容纳5个客户,可能是70个中的任何一个。我目前正在捕获研讨会的每一次出席以及谁参加:

MATCH (c1:Client {id: cid}), ..., (c5:Client {id: cid}), (s:Seminar {id: sid})
WITH c1, c2, c3, c4, c5
CREATE UNIQUE (c1)-[:ATTENDED {event_id: eid}]->(s)
...
CREATE UNIQUE (c5)-[:ATTENDED {event_id: eid}]->(s)
WITH c1, c2, c3, c4, c5, s
MERGE (c1)-[x:WITH]-(c2)
ON MATCH SET x.count = x.count + 1
ON CREATE SET x.count = 1
...repeat for c1 & c3, c1 & c4, c1 & c5
WITH c2, c3, c4, c5
...repeat c2 & c3, c2 & c4, c2 & c5
WITH c3, c4, c5
...repeat for c3 & c4, c3 & c5...
WITH c4, c5
MERGE (c4)-[x:WITH]-(c5)
ON MATCH SET x.count = x.count + 1
ON CREATE SET x.count = 1;

对于新活动:

(x:Seminar {event_id: xid})

我想"目标"参加各种研讨会的前五名客户

(:Client)-[r:WITH]-(:Client) WHERE r.count >= 1

目标是收集最熟悉的客户"彼此。我该如何编码此查询?我有足够的信息(关系和属性)吗?有没有更好的方法来添加事件数据?

1 个答案:

答案 0 :(得分:1)

我可以建议一种替代方案来建立你的:与人际关系。

MATCH (c:Client)-[:ATTENDED]->(:Seminar)<-[:ATTENDED]-(co:Client)
WITH c, co, COUNT(co) as timesWith
MERGE (c)-[r:WITH]-(co)
SET r.count = timesWith

这会让每个客户,他们参加过研讨会的客户以及他们参加研讨会的次数,以及保存(或更新)依赖于您的人与人之间的关系。

如果您可以提供一组ID作为查询的参数,您也可以更轻松地创建研讨会以及客户和研讨会之间的关系,因为您可以一次完成所有操作而不是单独执行:< / p>

MATCH (c:Client), (s:Seminar {id: sid})
WHERE c.id IN {attendeeIDs}
MERGE (c)-[:ATTENDED]->(s)
// and then you can run the query above to update WITH relationships if necessary

至于你想要的其余部分,这是一个相当棘手的问题,而且我不确定你是否明确了你的方法应该是什么。

您是否正在寻找一组5:客户端,例如:它们之间的关系总和:它们之间的关系是其他任何一组5中的最大值?因为这种查询需要您测试5个客户端的每个组合并执行该计算,我们还需要特别注意确保我们使用组合而不是排列来执行此操作。

即使这样,查询也会非常昂贵,因为70种可能性中的5种组合的数量是C(70,5)= 12,103,014。这就构建了很多行,并且在这些行中的每一行上运行操作。

// first match on a combination of 5; id inequalities prevent permutations
MATCH (c1:Client), (c2:Client), (c3:Client), (c4:Client), (c5:Client)
WHERE id(c1) < id(c2) < id(c3) < id(c4) < id(c5)
WITH c1, c2, c3, c4, c5, [id(c1),id(c2),id(c3),id(c4),id(c5)] as ids
// find all possible :WITH relationships between each set of 5
OPTIONAL MATCH (a)-[r:WITH]-(b)
WHERE id(a) in ids AND id(b) in ids
WITH c1,c2,c3,c4,c5, SUM(r.count) as togetherness
ORDER BY togetherness DESC
RETURN c1,c2,c3,c4,c5
LIMIT 1

有很多方法可以提高效率。而不是全部看:客户端,您可能首先尝试获得前n个左右:基于研讨会的客户参与,然后尝试运行类似的查询。

如果您首先选择参加研讨会的前15名与会者,那么这就是它的外观,然后尝试找到这些组合最多的5组:

MATCH (c:Client)
WITH c, SIZE((c)-[:ATTENDED]->(:Seminar)) as attendance
ORDER BY attendance DESC
WITH c
LIMIT 15
WITH COLLECT(id(c)) as ids
// first match on a combination of 5; id inequalities prevent permutations
MATCH (c1:Client), (c2:Client), (c3:Client), (c4:Client), (c5:Client)
WHERE id(c1) in ids, id(c2) in ids, id(c3) in ids, id(c4) in ids, id(c5) in ids
 AND id(c1) < id(c2) < id(c3) < id(c4) < id(c5)
WITH c1, c2, c3, c4, c5, [id(c1),id(c2),id(c3),id(c4),id(c5)] as ids
// find all possible :WITH relationships between each set of 5
OPTIONAL MATCH (a)-[r:WITH]-(b)
WHERE id(a) in ids AND id(b) in ids
WITH c1,c2,c3,c4,c5, SUM(r.count) as togetherness
ORDER BY togetherness DESC
RETURN c1,c2,c3,c4,c5
LIMIT 1