我有一个返回扩展名的UDF(GetUrlExt)。 (例如:/ abc/models/xyz/images/top.jpg中的jpg)。 数据如下所示:
Date Time TimeTaken uristem
9/5/2011 0:00:10 234 /abc/models/xyz/images/top.jpg
9/5/2011 0:00:11 456 /abc/models/xyz/images/bottom.jpg
9/5/2011 0:00:14 789 /abc/models/xyz/images/left.gif
9/5/2011 0:00:16 234 /abc/models/xyz/images/top.pdf
9/5/2011 0:00:18 734 /abc/models/xyz/images/top.pdf
9/5/2011 0:00:19 654 /abc/models/xyz/images/right.gif
9/5/2011 0:00:21 346 /abc/models/xyz/images/top.pdf
9/5/2011 0:00:24 556 /abc/models/xyz/images/front.pdf
9/5/2011 0:00:26 134 /abc/models/xyz/images/back.jpg
没有'GROUP BY'的查询工作正常:
SELECT GetUrlExt(uristem) AS extn FROM LogTable;
结果: JPG JPG GIF PDF格式 PDF格式 GIF PDF格式 PDF格式 JPG
现在我需要'GROUP BY'来获取GetUrlExt UDF的结果
预期结果:
jpg 3 274.6
gif 2 721.5
pdf 4 467.5
但以下查询无效:
SELECT GetUrlExt(uristem) AS extn, Count(*) AS PerCount, Avg(TimeTaken) AS AvgTime FROM LogTable GROUP BY extn;
感谢任何形式的帮助!
答案 0 :(得分:5)
请使用子查询进行分组。
Hive不直接按计算值支持分组。
SELECT a.extn, Count(*) AS PerCount, Avg(TimeTaken) AS AvgTime
FROM
(
SELECT GetUrlExt(uristem) AS extn, TimeTaken
FROM LogTable
) a
GROUP BY a.extn;
答案 1 :(得分:1)
您可以按别名启用分组,也可以按
分组启用整个语句SELECT GetUrlExt(uristem) AS extn, Count(*) AS PerCount, Avg(TimeTaken) AS AvgTime
FROM LogTable
GROUP BY GetUrlExt(uristem);