在mysql / SparkSQL中,我们具有first
函数。在Redshift中不存在。
我必须更改代码
SELECT
product_id,
first(product_code) as product_code,
first(product_name) as product_name,
first(time_date) as time_date,
max(price_max) as price_max,
min(price_min) as price_min,
sum(count_of_sales) as count_of_sales,
SUM(CASE WHEN time_date = 1538323200000 THEN cost_of_stock_start ELSE 0 END) as cost_of_stock_start,
from storeproductincomelogs
WHERE time_date>= 1538323200000
AND time_date<= 1541001600000
group by product_id;
到
SELECT
product_id,
product_code,
product_name,
min(time_date) as time_date, # have to change first to min, this column can't group by
max(price_max) as price_max,
min(price_min) as price_min,
sum(count_of_sales) as count_of_sales,
SUM(CASE WHEN time_date = 1538323200000 THEN cost_of_stock_start ELSE 0 END) as cost_of_stock_start,
from storeproductincomelogs
WHERE time_date>= 1538323200000
AND time_date<= 1541001600000
group by product_id,product_code,product_name;
没有first
,我必须按条款在组中添加product_code,product_name
。
否则会报错:
无效操作:“ storeproductincomelogs.product_code”列必须出现在GROUP BY子句中或在聚合函数中使用;
注意:这里product_id,product_code
在每一行中都是唯一的,并且也很难找到重复的product_name
(但是将来可能会出现,所以我认为我不能使用group by)。
我搜索了与Postgresql中的mysql first
等效的Select first row in each GROUP BY group?。
首先,我尝试了Distinct on
clause,这在Redshift中不受支持。
然后,我尝试了:
WITH summary AS (
SELECT product_id,
product_code,
product_name,
min(time_date) as time_date,
max(price_max) as price_max,
sum(count_of_sales) as count_of_sales,
SUM(CASE WHEN time_date = 1538323200000 THEN cost_of_stock_start ELSE 0 END) as cost_of_stock_start,
ROW_NUMBER() OVER(PARTITION BY product_id ) AS rk
FROM storeproductincomelogs)
SELECT *
FROM summary
WHERE rk = 1;
遇到错误
[42803] [500310] Amazon无效操作:“ storeproductincomelogs.product_id”列必须出现在GROUP BY子句中或在聚合函数中使用;
我不知道如何写正确的,所以无法测试性能。
如何在Redshift中做到这一点?
答案 0 :(得分:1)
据我了解,您不想按product_code
和product_name
进行分组,因为极有可能对于给定的产品ID,它们并不总是相同的。
因此,我建议同时也使用这两个字段中的min
(或max
):
SELECT
product_id,
min(product_code) as product_code,
min(product_name) as product_name,
min(time_date) as time_date,
max(price_max) as price_max,
... ...
group by product_id;
答案 1 :(得分:0)
您可以尝试以下操作-您需要添加ROW_NUMBER() OVER(PARTITION BY product_id order by price_max desc)
,这将为您提供明智的产品最高价格
WITH summary AS (
SELECT product_id,
product_code,
product_name,
price_max,
ROW_NUMBER() OVER(PARTITION BY product_id order by price_max desc) AS rk
FROM storeproductincomelogs)
SELECT *
FROM summary
WHERE rk = 1;