在neo4j中为每个组获取前n个记录

时间:2015-10-05 14:56:14

标签: neo4j

我需要对来自neo4j数据库的数据进行分组,然后过滤掉除每个组的前n个记录之外的所有内容。

示例:

我有两种节点类型:Order和Article。他们之间有一个" ADDED"关系。 " ADDED" relationship有一个时间戳属性。我想知道的(对于每篇文章)是在添加到订单的前两篇文章中有多少次。我尝试的是以下方法:

  1. 获取所有订单 - [已添加] - 条款

  2. 将第1步的结果按订单ID作为第一个排序键,然后按ADDED关系的时间戳作为第二个排序键;

  3. 对于步骤2中代表一个订单的每个子组,只保留前两行;

  4. 在步骤3的输出中计算不同的文章ID;

  5. 我的问题是我在第3步陷入困境。是否有可能为代表订单的每个子组获得前2行?

    谢谢,

    Tiberiu

3 个答案:

答案 0 :(得分:7)

尝试

MATCH (o:Order)-[r:ADDED]->(a:Article)
WITH o, r, a
ORDER BY o.oid, r.t
WITH o, COLLECT(a)[..2] AS topArticlesByOrder UNWIND topArticlesByOrder AS a
RETURN a.aid AS articleId, COUNT(*) AS count

结果看起来像

articleId    count
   8           6
   2           2
   4           5
   7           2
   3           3
   6           5
   0           7

使用

创建此sample graph
FOREACH(opar IN RANGE(1,15) |
    MERGE (o:Order {oid:opar})
    FOREACH(apar IN RANGE(1,5) |
        MERGE (a:Article {aid:TOINT(RAND()*10)})
        CREATE o-[:ADDED {t:timestamp() - TOINT(RAND()*1000)}]->a
    )
)

答案 1 :(得分:2)

使用LIMIT结合ORDER BY获取前N个。例如,前5个分数为:

MATCH (node:MyScoreNode) 
RETURN node
ORDER BY node.score DESC
LIMIT 5;

ORDER BY部分确保首先显示最高分数。 LIMIT仅为您提供前5个,因为它们经过排序,始终是最高的。

答案 2 :(得分:0)

I tried to achieve your desired results and failed.

So, my guess - this one is impossible with pure cypher.

What is the problem? Cypher is considering everything as a paths. And actually is doing traverse.
Trying to group results and then execute filter on each group means that cypher should somehow branch it traversing at some points. But Cypher executed filter on all results, because they are considered as collection of different paths.

My suggestion - create several queries, that achieves desired functionality, and implement some client-side logic.