如何按周分组Cloudera impala

时间:2014-09-01 03:51:01

标签: cloudera impala

如何按周对Impala查询结果进行分组?数据如下:

    userguid                 eventtime
0   66AB1405446C74F2992016E5 2014-08-01T16:43:05Z
1   66AB1405446C74F2992016E5 2014-08-02T20:12:12Z
2   4097483F53AB3C170A490D44 2014-08-03T18:08:50Z
3   4097483F53AB3C170A490D44 2014-08-04T18:10:08Z
4   4097483F53AB3C170A490D44 2014-08-05T18:14:51Z
5   4097483F53AB3C170A490D44 2014-08-06T18:15:29Z
6   4097483F53AB3C170A490D44 2014-08-07T18:17:15Z
7   4097483F53AB3C170A490D44 2014-08-08T18:18:09Z
8   4097483F53AB3C170A490D44 2014-08-09T18:18:18Z
9   4097483F53AB3C170A490D44 2014-08-10T18:23:30Z

预期结果是:

date                    count of different userguid
2014-08-01~2014-08-07   40
2014-08-08~2014-08-15   20
2014-08-16~2014-08-23   10

谢谢。

2 个答案:

答案 0 :(得分:5)

如果eventtime存储为timestamp

SELECT TRUNC(eventtime, "D"), COUNT(DISTINCT userguid)
FROM your_table
GROUP BY TRUNC(eventtime, "D")
ORDER BY TRUNC(eventtime, "D");

否则eventtime存储为string

SELECT TRUNC(CAST(eventtime AS TIMESTAMP), "D"), COUNT(DISTINCT userguid)
FROM your_table
GROUP BY TRUNC(CAST(eventtime AS TIMESTAMP), "D")
ORDER BY TRUNC(CAST(eventtime AS TIMESTAMP), "D");

有关TRUNC功能的详细信息,请参阅Cloudera Impala documentation on Date and Time Functions

答案 1 :(得分:0)

在Impala中,TRUNC(时间戳,“D”)表示查找一周的开始日期。您可以查看Impala日期和时间函数here

例如:

<div id="top_box">
Lorem Ipsum 1
</div>

<div id="middle_box">
Lorem Ipsum 2
</div>