How to group results in ArangoDb into single record?

时间:2018-04-18 18:00:39

标签: arangodb aql

I have the list of events of certain type, structured on the following example:

{
 createdAt: 123123132,
 type: STARTED,
 metadata: {
     emailAddress: "foo@bar.com"
 }
}

The number of types is predefined (START, STOP, REMOVE...). Users produce one or more events during time.

I want to get the following aggregation:

For each user, calculate the number of events for each type.

My AQL query looks like this:

FOR event IN events
  COLLECT
    email = event.metadata.emailAddress,
    type = event.type WITH COUNT INTO count
  LIMIT 10
  RETURN {
      email,
      t: {type, count}
  }

This produces the following output:

{ email: '_84@example.com', t: { type: 'CREATE', count: 203 } }
{ email: '_84@example.com', t: { type: 'DEPLOY', count: 214 } }
{ email: '_84@example.com', t: { type: 'REMOVE', count: 172 } }
{ email: '_84@example.com', t: { type: 'START', count: 204 } }
{ email: '_84@example.com', t: { type: 'STOP', count: 187 } }
{ email: '_95@example.com', t: { type: 'CREATE', count: 189 } }
{ email: '_95@example.com', t: { type: 'DEPLOY', count: 173 } }
{ email: '_95@example.com', t: { type: 'REMOVE', count: 194 } }
{ email: '_95@example.com', t: { type: 'START', count: 213 } }
{ email: '_95@example.com', t: { type: 'STOP', count: 208 } }
...

i.e. I got a row for each type. But I want results like this:

{ email: foo@bar.com, count1: 203, count2: 214, count3: 172 ...}
{ email: aaa@fff.com, count1: 189, count2: 173, count3: 194 ...}
...

OR

 { email: foo@bar.com, CREATE: 203, DEPLOY: 214, ... }
 ...

i.e. to group again the results.

I also need to sort the results (not the events) by the counts: to return e.g. the top 10 users with max number of CREATE events.

How to do that?

ONE SOLUTION

One solution is here, check the accepted answer for more.

FOR a in (FOR event IN events
  COLLECT
    emailAddress = event.metadata.emailAddress,
    type = event.type WITH COUNT INTO count
  COLLECT email = emailAddress INTO perUser KEEP type, count
  RETURN MERGE(PUSH(perUser[* RETURN {[LOWER(CURRENT.type)]: CURRENT.count}], {email})))
SORT a.create desc
LIMIT 10
RETURN a

1 个答案:

答案 0 :(得分:2)

您可以按用户和事件类型进行分组,然后再由用户再次分组,仅保留类型和已计算的事件类型计数。在第二个聚合中,重要的是要知道事件落入哪些组来构造结果。可以使用array inline projection来保持查询简短:

FOR event IN events
  COLLECT
    emailAddress = event.metadata.emailAddress,
    type = event.type WITH COUNT INTO count
  COLLECT email = emailAddress INTO perUser KEEP type, count
    RETURN MERGE(PUSH(perUser[* RETURN {[CURRENT.type]: CURRENT.count}], {email}))

另一种方法是按用户分组并保留事件类型,然后将类型分组到子查询中。但是我的测试速度明显较慢(至少没有定义任何索引):

FOR event IN events
  LET type = event.type
  COLLECT
    email = event.metadata.emailAddress INTO groups KEEP type
    LET byType = (
    FOR t IN groups[*].type
        COLLECT t2 = t WITH COUNT INTO count
        RETURN {[t2]: count}
    )
    RETURN MERGE(PUSH(byType, {email}))

返回具有最多CREATE事件的前10位用户要简单得多。过滤CREATE事件类型,然后按用户分组并计算事件数,按降序排序并返回前10个结果:

FOR event IN events
    FILTER event.type == "CREATE"
    COLLECT email = event.metadata.emailAddress WITH COUNT INTO count
    SORT count DESC
    LIMIT 10
    RETURN {email, count}

EDIT1 :每个用户返回一个文档,其中事件类型已分组并计算(如第一个查询中),但捕获MERGE结果,按一个特定事件类型的计数排序(此处:CREATE) )并返回此类型的前10位用户。结果与问题中给出的解决方案相同。然而,它将子查询保留为la FOR a IN (FOR event IN events ...) ... RETURN a

FOR event IN events
  COLLECT
    emailAddress = event.metadata.emailAddress,
    type = event.type WITH COUNT INTO count
  COLLECT email = emailAddress INTO perUser KEEP type, count
  LET ret = MERGE(PUSH(perUser[* RETURN {[CURRENT.type]: CURRENT.count}], {email}))
  SORT ret.CREATE DESC
  LIMIT 10
  RETURN ret

EDIT2 :查询以生成示例数据(需要存在集合events):

FOR i IN 1..100
    LET email = CONCAT(RANDOM_TOKEN(RAND()*4+4), "@example.com")
    FOR j IN SPLIT("CREATE,DEPLOY,REMOVE,START,STOP", ",")
        FOR k IN 1..RAND()*150+50
            INSERT {metadata: {emailAddress: email}, type: j} INTO events RETURN NEW