amazon-redshift:按id选择id,first(a),sum(b)

时间:2019-01-21 08:11:05

标签: sql group-by amazon-redshift greatest-n-per-group

在mysql / SparkSQL中,我们具有first函数。在Redshift中不存在。

我必须更改代码

SELECT
  product_id,
  first(product_code) as product_code,
  first(product_name) as product_name,
  first(time_date) as time_date, 
  max(price_max) as price_max,
  min(price_min) as price_min,
  sum(count_of_sales) as count_of_sales,
  SUM(CASE WHEN time_date = 1538323200000 THEN cost_of_stock_start ELSE 0 END) as cost_of_stock_start,
from storeproductincomelogs 
WHERE time_date>= 1538323200000 
  AND time_date<= 1541001600000 
group by product_id;

SELECT
  product_id,
  product_code,
  product_name,
  min(time_date) as time_date,  # have to change first to min, this column can't group by
  max(price_max) as price_max,
  min(price_min) as price_min,
  sum(count_of_sales) as count_of_sales,
  SUM(CASE WHEN time_date = 1538323200000 THEN cost_of_stock_start ELSE 0 END) as cost_of_stock_start,
from storeproductincomelogs 
WHERE time_date>= 1538323200000 
  AND time_date<= 1541001600000 
group by product_id,product_code,product_name;

没有first,我必须按条款在组中添加product_code,product_name。 否则会报错:

  

无效操作:“ storeproductincomelogs.product_code”列必须出现在GROUP BY子句中或在聚合函数中使用;

注意:这里product_id,product_code在每一行中都是唯一的,并且也很难找到重复的product_name(但是将来可能会出现,所以我认为我不能使用group by)。


我搜索了与Postgresql中的mysql first等效的Select first row in each GROUP BY group?

首先,我尝试了Distinct on clause,这在Redshift中不受支持。

然后,我尝试了:

WITH summary AS (
    SELECT product_id,
           product_code,
           product_name,
            min(time_date) as time_date,
            max(price_max) as price_max,
            sum(count_of_sales) as count_of_sales,
            SUM(CASE WHEN time_date = 1538323200000 THEN cost_of_stock_start ELSE 0 END) as cost_of_stock_start,
           ROW_NUMBER() OVER(PARTITION BY product_id ) AS rk
      FROM  storeproductincomelogs)
SELECT *
  FROM summary
 WHERE rk = 1;

遇到错误

  

[42803] [500310] Amazon无效操作:“ storeproductincomelogs.product_id”列必须出现在GROUP BY子句中或在聚合函数中使用;

我不知道如何写正确的,所以无法测试性能。

如何在Redshift中做到这一点?

2 个答案:

答案 0 :(得分:1)

据我了解,您不想按product_codeproduct_name进行分组,因为极有可能对于给定的产品ID,它们并不总是相同的。

因此,我建议同时也使用这两个字段中的min(或max):

SELECT
  product_id,
  min(product_code) as product_code,
  min(product_name) as product_name,
  min(time_date) as time_date,
  max(price_max) as price_max,
  ... ...
group by product_id;

答案 1 :(得分:0)

您可以尝试以下操作-您需要添加ROW_NUMBER() OVER(PARTITION BY product_id order by price_max desc),这将为您提供明智的产品最高价格

WITH summary AS (
    SELECT product_id,
           product_code,
           product_name,
           price_max,
           ROW_NUMBER() OVER(PARTITION BY product_id order by price_max desc) AS rk
      FROM  storeproductincomelogs)
SELECT *
  FROM summary
 WHERE rk = 1;