多维直方图

时间:2018-11-30 13:41:53

标签: sql amazon-athena presto

我有一个包含4列的时间序列数据集:a,b,c和时间戳。我想为所有三列创建垃圾箱以获取出现表。下表示例显示了示例数据:

 a  |  b  |  c  | timestamp
0.2 | 1.2 | 0.1 | 2018-01-01 00:00:00
0.3 | 2.2 | 0.2 | 2018-01-01 00:00:01
0.4 | 3.2 | 0.3 | 2018-01-01 00:00:02
0.2 | 0.2 | 0.4 | 2018-01-01 00:00:03
0.6 | 1.3 | 0.5 | 2018-01-01 00:00:04
0.7 | 2.4 | 0.1 | 2018-01-01 00:00:05
1.2 | 0.5 | 0.2 | 2018-01-01 00:00:06
3.2 | 1.7 | 0.3 | 2018-01-01 00:00:07
2.2 | 2.5 | 0.4 | 2018-01-01 00:00:08
1.3 | 3.7 | 0.5 | 2018-01-01 00:00:09
1.4 | 0.8 | 0.6 | 2018-01-01 00:00:10
etc.

现在我要创建这样的垃圾箱:

a >= 0. and a < 0.25 and b >= 0. and b < 0.25 and c >= 0. and c < 0.25
a >= 0. and a < 0.25 and b >= 0. and b < 0.25 and c >= 0.25 and c < 0.5
a >= 0. and a < 0.25 and b >= 0. and b < 0.25 and c >= 0.5 and c < 0.75
a >= 0. and a < 0.25 and b >= 0.25 and b < 0.5 and c >= 0. and c < 0.25
a >= 0. and a < 0.25 and b >= 0.25 and b < 0.5 and c >= 0.25 and c < 0.5
etc.

这样,我将获得一个出现表,其中将箱应用于每个维度。\

我有以下适用于一列的SQL:

select
  bucket,
  0.+(bucket-1)*0.25 as low_a,
  0.+bucket*0.25 as high_a,
  count(1) as cnt
from
(
  select
    width_bucket(a, 0., 4., 16) as bucket
  from
    table
 )
group by bucket order by bucket;

我如何能够将存储桶应用于多列?

0 个答案:

没有答案