我正在使用HiveQL,我需要在每个类别中选择购买次数最多的10个商品。我想使用常规SQL可以轻松解决相同的问题。
有什么方法比下面的代码片段更快吗?我只是不明白如何在这里使用所谓的 window函数 ...
SELECT item,
COUNT(item) AS freq FROM mytable WHERE category='category1' GROUP BY item ORDER BY freq DESC LIMIT 1
union all SELECT item, COUNT(item) AS freq FROM mytable WHERE category='category2' GROUP BY product ORDER BY freq DESC LIMIT 1
union all SELECT item, COUNT(item) AS freq FROM mytable WHERE category='category3' GROUP BY item ORDER BY freq DESC LIMIT 1
union all SELECT item, COUNT(item) AS freq FROM mytable WHERE category='category4' GROUP BY item ORDER BY freq DESC LIMIT 1
...
表数据结构:
item1 category1
item2 category1
item2 category1
item5 category2
item5 category2
item4 category3
item2 category4
结果应为:
item2 category1
item5 category2
item4 category3
item2 category4
答案 0 :(得分:2)
使用row_number()
和group by
:
SELECT category, item, freq
FROM (SELECT category, item, COUNT(*) AS freq,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY COUNT(*) DESC) as seqnum
FROM mytable
GROUP BY category, item
) ci
WHERE seqnum = 1;
即使有最常见的联系,这也会为每个类别返回一行。如果您希望在ties
的情况下拥有所有可能,请使用rank()
代替row_number()
。