我有这个查询:
WITH product_grouped AS (
SELECT
p.uniqueid,
cost,
case WHEN cost <= 10 THEN '0-10'
WHEN cost > 10 AND cost <= 25 THEN '10 < Price <= 25'
WHEN cost >= 25 and cost <= 100 THEN '25 < Price <= 100'
ELSE 'Price > 100' end AS price_group
FROM mmc..product p
),
quarters_grouped AS (
SELECT
uniqueid,
unitprice,
quantity,
cart_orderdate,
case WHEN c.cart_orderdate >= DATEADD(QUARTER, -1, getdate()) THEN 'Q1'
WHEN c.cart_orderdate >= DATEADD(QUARTER, -2, getdate()) and c.cart_orderdate < DATEADD(QUARTER, -1, getdate())THEN 'Q2'
WHEN c.cart_orderdate >= DATEADD(QUARTER, -3, getdate()) and c.cart_orderdate < DATEADD(QUARTER, -2, getdate())THEN 'Q3'
WHEN c.cart_orderdate >= DATEADD(QUARTER, -4, getdate()) and c.cart_orderdate < DATEADD(QUARTER, -3, getdate())THEN 'Q4'
WHEN c.cart_orderdate >= DATEADD(QUARTER, -5, getdate()) and c.cart_orderdate < DATEADD(QUARTER, -4, getdate())THEN 'Q5'
WHEN c.cart_orderdate >= DATEADD(QUARTER, -6, getdate()) and c.cart_orderdate < DATEADD(QUARTER, -5, getdate())THEN 'Q6'
WHEN c.cart_orderdate >= DATEADD(QUARTER, -7, getdate()) and c.cart_orderdate < DATEADD(QUARTER, -6, getdate())THEN 'Q7'
WHEN c.cart_orderdate >= DATEADD(QUARTER, -8, getdate()) and c.cart_orderdate < DATEADD(QUARTER, -7, getdate())THEN 'Q8'
end AS quarters
FROM omc..cart c
WHERE c.cart_orderdate > '2018-01-01'
),
sum_by_quarter AS (
SELECT pg.uniqueid, price_group, quarters, SUM(unitprice * quantity) AS total_sold
FROM product_grouped pg
INNER JOIN quarters_grouped qg
ON pg.uniqueid = qg.uniqueid
GROUP BY pg.uniqueid, price_group, quarters
),
partition_five AS (
SELECT quarters, price_group, uniqueid, total_sold, ROW_NUMBER() OVER (PARTITION BY quarters, price_group ORDER BY total_sold DESC) AS RowNo
FROM sum_by_quarter
)
SELECT *
FROM partition_five
WHERE RowNo <= 5
ORDER BY quarters, price_group, total_sold desc
我想知道过去8个季度中不同价格范围内产品的前5大销售产品ID。在购物车表格中,最畅销的商品定义为unitprice * quantity
。价格范围是Cost <= $10, 10 < Cost <= 25, 25 < Cost <= 100, Cost < 100
。我的查询提供了我想要的内容,但是没有所有case语句,有没有更简单的方法呢?
答案 0 :(得分:0)
请考虑根据日期差计算季度差。
import org.apache.spark.sql.functions._
val df = Seq(
(1, 10, "a1", "a2", "a3"),
(1, 10, "b1", "b2", "b3"),
(2, 20, "c1", "c2", "c3"),
(2, 30, "d1", "d2", "d3"),
(2, 30, "e1", "e2", "e3")
).toDF("gc1", "gc2", "val1", "val2", "val3")
val gmList = List("gc1", "gc2")
val aList = List("val1", "val2", "val3")
// Populate with different aggregate methods for individual columns if necessary
val fList = List.fill(aList.size)("first")
val afPairs = aList.zip(fList)
// afPairs: List[(String, String)] = List((val1,first), (val2,first), (val3,first))
df.
groupBy(gmList.map(col): _*).agg(afPairs.toMap).
select(gmList.map(col) ::: afPairs.map{ case (v, f) => col(s"$f($v)").as(v) }: _*).
show
// +---+---+----+----+----+
// |gc1|gc2|val1|val2|val3|
// +---+---+----+----+----+
// | 2| 20| c1| c2| c3|
// | 1| 10| a1| a2| a3|
// | 2| 30| d1| d2| d3|
// +---+---+----+----+----+
特别是在 quarters_grouped CTE内:
'Q' + CONVERT(VARCHAR(5), CEILING(DATEDIFF(day, c.cart_orderdate, getdate()) / 91) + 1)
Rextester Demo (带有随机日期)