我在Redshift中有一个表,其中包含约300亿行,结构如下,
userid itemid country start_date
uid1 itemid1 country1 2018-07-25 00:00:00
uid2 itemid2 country1 2018-07-25 00:00:00
uid3 itemid1 country2 2018-07-25 00:00:00
uid4 itemid3 country1 2018-07-25 00:00:00
uid5 itemid1 country1 2018-07-25 00:00:00
uid1 itemid2 country2 2018-07-25 00:00:00
uid2 itemid2 country2 2018-07-25 00:00:00
在这里,我想查找有多少不重复用户购买的商品,然后为每个国家/地区和start_date选择最畅销的1000个商品。在这里,商品的等级和销售次数都是必需的。
预期会有以下输出
itemid country sold_count start_date
itemid1 country1 2 2018-07-25 00:00:00
itemid2 country2 2 2018-07-25 00:00:00
itemid1 country2 1 2018-07-25 00:00:00
itemid2 country1 1 2018-07-25 00:00:00
itemid3 country1 1 2018-07-25 00:00:00
我正在尝试实现等级函数,但没有得到预期的结果。
我正在尝试查询,
select itemid, start_date, Rank() over (partition by itemid order by
count(distinct(userid)) desc) as rank1
from table_name
group by item_id, start_date
order by rank1 desc;
此外,我想有一个列,用于按国家和起始日期对unidue用户ID购买的item_id组进行计数。在上面的查询中,我忽略了country列以简化查询。
请帮助我。
答案 0 :(得分:1)
如果我认为“版本”的意思是“国家”,那么我认为你想要:
select *
from (select itemid, country, start_date, count(distinct userid) as num_users,
row_number() over (partition by country, start_date
order by count(distinct userid) desc
) as seqnum
from table_name
group by item_id, country, start_date
) x
where seqnum <= 1000
答案 1 :(得分:0)
_form.html.erb
答案 2 :(得分:0)
正如您在问题中所说,您希望“查找某商品是由多少个唯一用户购买的,然后然后为每个国家和起始日期选择销量最高的1000件商品”,因此您可以尝试使用CTE一步一步地完成操作,而不是编写单个查询:
with
items_by_country as (
select
itemid
,country
,count(distinct userid)
,min(start_date) as start_date
from table_name
group by 1,2
)
,ranked_groups as (
select
*
,row_number() over (partition by country order by count desc)
from items_by_country
)
select *
from ranked_groups
where row_number<=1000
order by 1,2,3 desc
;