可以使用reduce函数返回多个结果吗?

时间:2013-09-27 11:28:17

标签: nosql rethinkdb rethinkdb-ruby

使用以下架构(定义如下)。我可以使用map reduce来聚合所有日期的deliver_count字段(这是广告系列文档中的嵌入式数组)。

  {
    campaign_id: 1,
    status: 'running',
    dates: {
      '20130926' => {
        delivered: 1,
        failed: 1,
        queued: 1,
        clicked: 1,
        males_count: 1,
        females_count: 1,
        pacific_region: { clicked_count: 10 },
        america_region: { clicked_count: 10 },
        atlantic_region: { clicked_count: 10 },
        europe_region: { clicked_count: 10 },
        africa_region: { clicked_count: 10 },
        etc_region: { clicked_count: 10 },
        asia_region: { clicked_count: 10 },
        australia_region: { clicked_count: 10 }
      },
      '20130927' => {
        delivered: 1,
        failed: 1,
        queued: 1,
        clicked: 1,
        males_count: 1,
        females_count: 1,
        pacific_region: { clicked_count: 10 },
        america_region: { clicked_count: 10 },
        atlantic_region: { clicked_count: 10 },
        europe_region: { clicked_count: 10 },
        africa_region: { clicked_count: 10 },
        etc_region: { clicked_count: 10 },
        asia_region: { clicked_count: 10 },
        australia_region: { clicked_count: 10 }
      },
      '20130928' => {
        delivered: 1,
        failed: 1,
        queued: 1,
        clicked: 1,
        males_count: 1,
        females_count: 1,
        pacific_region: { clicked_count: 10 },
        america_region: { clicked_count: 10 },
        atlantic_region: { clicked_count: 10 },
        europe_region: { clicked_count: 10 },
        africa_region: { clicked_count: 10 },
        etc_region: { clicked_count: 10 },
        asia_region: { clicked_count: 10 },
        australia_region: { clicked_count: 10 }
      }
    }
  }

以下代码通过字段asia_regions解析输出字段clicked_count =>的值30(所有数据的组合值)

$rethinkdb.table(:daily_stat_campaigns).filter { |daily_stat_campaign| daily_stat_campaign[:campaign_id].eq 1 }[0][:dates].do { |doc|
  doc.keys.map { |key|
    doc.get_field(key)[:asia_region][:clicked_count].default(0)
  }.reduce { |left, right|
    left+right
  }
}.run

是否可以运行上面的代码但是针对多个区域?这样我就可以运行一个返回多个总和的查询。我想要实现的输出类似于下面的伪结果。

[{ asia_region: {clicked_count: 30}}, {america_region: {clicked_count: 30} }]

2 个答案:

答案 0 :(得分:1)

我对你发布的代码感到有点困惑。为什么一切都在filter之内?要输出您想要的内容,请执行以下操作:

regions = [:pacific_region, :america_region, ...]
reg_clicks = r.table(:daily_stat_campaigns).concat_map { |row|
                 row[:dates]
                 .coerce_to("ARRAY")
                 .map{ |date| date[0] }
                 .pluck(regions)
                 .coerce_to("ARRAY")
              }

您现在可以运行reg_clicks,它应该如下所示:

$ reg_clicks.run()
[[:asia_region, {clicked_count: 30}], [:etc_region, {clicked_count: 30}], ...]

现在我们需要进行最后一次转换来聚合它:

$ aggregate = reg_clicks.map{ |reg|
                  {reg: reg[0], clicked_count: reg[0][:clicked_count]}
              }
              .group_by(:reg, r.sum(:clicked_count))

这将为您提供如下输出:

[{group: :asia_region, reduction: 150} ...]

如果您希望它看起来与您想要的完全一样,那么您可以应用最终转换:

aggregate.map{ |row|
    [row[:group], row[:reduction]]
}
.coerce_to("OBJECT")

如果您稍微规范化数据,这些查询肯定会更好一些。将事情分解为另外两个表:date和:region_clicks,看起来像这样:

#dates
{
    id: 0
    campaign_id: 1
    date: '20130927'
    delivered: 1,
    failed: 1,
    queued: 1,
    clicked: 1,
    males_count: 1
}

#region_clicks
{
    region: "asia_region",
    click_count: 30,
    date_id: 0
}

然后您的查询将如下:

r.table(:region_clicks).group_by(:region, r.sum(:click_count)).run()

答案 1 :(得分:1)

这似乎有效:

require 'awesome_print' # For better readability on output

regions = [:pacific_region, :america_region]
reg_clicks = $rethinkdb.table(:daily_stat_campaigns).filter { |daily_stat_campaign| daily_stat_campaign[:campaign_id].eq 1 }[0][:dates].do { |doc|
  doc.keys.concat_map { |key|
    doc
    .get_field(key)
    .pluck(regions)
    .coerce_to("ARRAY")
  }
}
ap reg_clicks.run

将输出类似:[["america_region", {"clicked_count"=>10}], ["pacific_region", {"clicked_count"=>10}], ["america_region", {"clicked_count"=>10}], ["pacific_region", {"clicked_count"=>10}], ["america_region", {"clicked_count"=>10}], ["pacific_region", {"clicked_count"=>10}]]

的内容
aggregate = reg_clicks.map { |reg|
  { reg: reg[0], clicked_count: reg[1][:clicked_count] }
}
ap aggregate.run

将输出:[{"reg"=>"america_region", "clicked_count"=>10}, {"reg"=>"pacific_region", "clicked_count"=>10}, {"reg"=>"america_region", "clicked_count"=>10}, {"reg"=>"pacific_region", "clicked_count"=>10}, {"reg"=>"america_region", "clicked_count"=>10}, {"reg"=>"pacific_region", "clicked_count"=>10}]

ap aggregate.group_by(:reg, $rethinkdb_rql.sum(:clicked_count)).run

输出:[{"reduction"=>30, "group"=>{"reg"=>"america_region"}}, {"reduction"=>30, "group"=>{"reg"=>"pacific_region"}}]