Question

我在Redshift中有一个表，其中包含约300亿行，结构如下，

userid    itemid   country   start_date
uid1     itemid1  country1  2018-07-25 00:00:00
uid2     itemid2  country1  2018-07-25 00:00:00
uid3     itemid1  country2  2018-07-25 00:00:00
uid4     itemid3  country1  2018-07-25 00:00:00
uid5     itemid1  country1  2018-07-25 00:00:00
uid1     itemid2  country2  2018-07-25 00:00:00
uid2     itemid2  country2  2018-07-25 00:00:00

在这里，我想查找有多少不重复用户购买的商品，然后为每个国家/地区和start_date选择最畅销的1000个商品。在这里，商品的等级和销售次数都是必需的。

预期会有以下输出

itemid     country   sold_count   start_date
itemid1    country1   2           2018-07-25 00:00:00
itemid2    country2   2           2018-07-25 00:00:00
itemid1    country2   1           2018-07-25 00:00:00
itemid2    country1   1           2018-07-25 00:00:00
itemid3    country1   1           2018-07-25 00:00:00

我正在尝试实现等级函数，但没有得到预期的结果。

我正在尝试查询，

  select itemid, start_date, Rank() over (partition by itemid order by 
  count(distinct(userid)) desc) as rank1
  from table_name 
  group by item_id, start_date
  order by rank1 desc;

此外，我想有一个列，用于按国家和起始日期对unidue用户ID购买的item_id组进行计数。在上面的查询中，我忽略了country列以简化查询。

请帮助我。

Answer 1

如果我认为“版本”的意思是“国家”，那么我认为你想要：

select *
from (select itemid, country, start_date, count(distinct userid) as num_users,
             row_number() over (partition by country, start_date 
                                order by count(distinct userid) desc
                               ) as seqnum
      from table_name 
      group by item_id, country, start_date
     ) x
where seqnum <= 1000

Answer 2

_form.html.erb

Answer 3

正如您在问题中所说，您希望“查找某商品是由多少个唯一用户购买的，然后然后为每个国家和起始日期选择销量最高的1000件商品”，因此您可以尝试使用CTE一步一步地完成操作，而不是编写单个查询：

with 
 items_by_country as (
    select 
     itemid
    ,country
    ,count(distinct userid)
    ,min(start_date) as start_date
    from table_name
    group by 1,2
)
,ranked_groups as (
    select 
     *
    ,row_number() over (partition by country order by count desc)
    from items_by_country
)
select *
from ranked_groups
where row_number<=1000
order by 1,2,3 desc
;

从表中查找前1000个条目以及计数和排名

3 个答案: