为什么collect(someThing)会改变RETURN otherUnrelatedThing的结果?

时间:2017-04-25 20:26:14

标签: neo4j

我注意到一些我不太了解的奇怪行为。用户报告了他们缺少帖子的错误。

一次删除部分我的完整查询后,我能够找出问题所在。

这将返回正确的帖子:

  MATCH (author:User {user_id: { user_id })

  MATCH (post:Post)<-[:AUTHOR]-(author)

  MATCH (post)-[:HAS_COMMENT]->(comment:Comment)<-[:AUTHOR]-(commentAuthor:User)
  WHERE NOT author.user_id = commentAuthor.user_id

  WITH
    post,
    author

  RETURN post  // returns the expected result

然而,完整查询的一部分是collect(commentAuthor),所以当我简单地添加它时,甚至不用它做任何事情:

  MATCH (author:User {user_id: { user_id })

  MATCH (post:Post)<-[:AUTHOR]-(author)

  MATCH (post)-[:HAS_COMMENT]->(comment:Comment)<-[:AUTHOR]-(commentAuthor:User)
  WHERE NOT author.user_id = commentAuthor.user_id

  WITH
    post,
    author,
    collect(commentAuthor) as commentAuthors  // because of this

  RETURN post  // becomes incorrect -  why would this change?

^导致最近回复帖子的一些用户被抛出。

更新:因此,在了解到应用聚合可以更改订单之后,事实证明我认为错过的帖子不是先返回,而是在中间返回结果,所以我必须确保最小查询聚合后的顺序:

MATCH (author:User {user_id: { user_id })

  MATCH (post:Post)<-[:AUTHOR]-(author)

  MATCH (post)-[:HAS_COMMENT]->(comment:Comment)<-[:AUTHOR]-(commentAuthor:User)
  WHERE NOT author.user_id = commentAuthor.user_id

  WITH
    post,
    comment,
    author,
    collect(commentAuthor) as commentAuthors

  RETURN post

  ORDER BY comment.createdAt DESC  // now gives me the expected result

但是对于完整查询,这有点困难:

  MATCH (author:User {user_id: { user_id }})

  MATCH (post:Post)<-[:AUTHOR]-(author)
  WHERE post.createdAt < { before } AND post.text =~ { keyword }

  MATCH (post)-[:HAS_COMMENT]->(comment:Comment)<-[:AUTHOR]-(commentAuthor:User)
  WHERE NOT author.user_id = commentAuthor.user_id

  WITH
    post,
    author,
    commentAuthor,
    max(comment.createdAt) as commentCreatedAt,
    count(comment) as commentsPerCommenter

  ORDER BY commentCreatedAt DESC  // I believe this happens too early.

  WITH
    post,
    author,
    sum(commentsPerCommenter) as commentsCount,
    collect(commentAuthor {.*, commentCreatedAt}) as commentAuthors

  WITH
    post,
    author,
    commentsCount,
    size(commentAuthors) as participantsCount,
    commentAuthors

  // I think some sort of ordering needs to happen here.
  // Before the UNWIND and after the collect(commentAuthor).

  // ORDER BY commentCreatedAt DESC here:
      // gives correct posts, incorrect participantsCount & commentsCount as 1-1

  UNWIND commentAuthors as commentAuthor

  RETURN collect(post {
    .*,
    author,
    commentAuthor,
    commentsCount,
    participantsCount,
    notificationType: 'reply'
  })[0..{ LIMIT }] as posts

实施例。另一次尝试订购collect(commentAuthor)

  MATCH (post)-[:HAS_COMMENT]->(comment:Comment)<-[:AUTHOR]-(commentAuthor:User)
  WHERE NOT author.user_id = commentAuthor.user_id

  WITH
    post,
    author,
    commentAuthor,
    max(comment.createdAt) as commentCreatedAt,
    count(comment) as commentsPerCommenter

  WITH
    post,
    author,
    sum(commentsPerCommenter) as commentsCount,
    commentCreatedAt,
    collect(commentAuthor {.*, commentCreatedAt}) as commentAuthors ORDER BY commentCreatedAt DESC

^两次尝试都会给出正确的下单,但计数不正确。

最后,我正在努力做到这一点:

---
Tom replied to 'your post'
1 hr ago  // based on time of Tom's latest comment in 'your post' (post.commentAuthor.commentCreatedAt)
3 participants | 3 comments
---
Erin replied to 'your other post'
2 hrs ago
5 participants | 6 comments
---
Kate replied to 'your post'
3 hrs ago
3 participants | 3 comments
---

* Tom may have also commented on 'your post' 1.5 hrs ago
but we only get the latest reply, which was 1 hr ago

1 个答案:

答案 0 :(得分:1)

好的,所以有了明确的要求,我们希望每个评论都在每个帖子各自的行上,每个帖子的参与者数量和评论数量。

我们关闭了,但是我们要么在匹配commentAuthors(可能是使用模式理解)之前计算每个帖子的commentCount和participantCount,要么我们可以在最后UNWIND我们的commentAuthors并在那里执行我们的排序。 / p>

让我们尝试第二种方法,无论如何你都在使用UNWIND走上了正确的道路。

修改

我们也会限制,然后根据您在评论中的请求收集()最后的行。

MATCH (author:User {user_id: { user_id }})

MATCH (post:Post)<-[:AUTHOR]-(author)
WHERE post.createdAt < { before } AND post.text =~ { keyword }

// removing labels for now since the relationships should be enough
// to match to the right nodes
MATCH (post)-[:HAS_COMMENT]->(comment)<-[:AUTHOR]-(commentAuthor)
WHERE author <> commentAuthor

WITH
 post,
 author,
 commentAuthor,
 count(comment) as commentsPerCommenter,
 max(comment.createdAt) as commentCreatedAt

WITH
 post,
 author,
 sum(commentsPerCommenter) as commentsCount,
 collect(commentAuthor {.*, commentCreatedAt}) as commentAuthors

WITH
 post,
 author,
 commentsCount,
 size(commentAuthors) as participantsCount,
 commentAuthors

UNWIND commentAuthors as commentAuthor

WITH
 post,
 author,
 commentsCount,
 participantsCount,
 commentAuthor

ORDER BY commentAuthor.commentCreatedAt DESC
LIMIT 11 // adjust as needed

RETURN collect(post { .*, author, commentAuthor, commentsCount, participantsCount, notificationType: 'reply' }) as postReplies