我有一个像这样的查询
SELECT
t.category,
tc.product,
tc.sub-product,
count(*) as sales
FROM tg t, ttc tc
WHERE t.value = tc.value
GROUP BY t.category, tc.product, tc.sub-product;
现在在我的查询中,我希望获得每个类别的前10个产品(按销售额排名),对于每个类别,我需要前5个子类别(按销售额排名)
您可以将问题陈述假设为:
按销售额为每个类别获取前10个产品,每个产品按销售额排名前5个子产品。
示例输入数据格式
category |product |subproduct |Sales [count (*)]
abc test1 test11 120
abc test1 test11 100
abc test1 test11 10
abc test1 test11 10
abc test1 test11 10
abc test1 test11 10
abc test1 test12 10
abc test1 test13 8
abc test1 test14 6
abc test1 test15 5
abc test2 test21 80
abc test2 test22 60
abc test3 test31 50
abc test3 test32 40
abc test4 test41 30
abc test4 test42 20
abc test5 test51 10
abc test5 test52 5
abc test6 test61 5
|
|
|
bcd test2 test22 10
xyz test3 test31 5
xyz test3 test32 3
xyz test4 test41 2
输出将是“
top 5 rf for (abc) -> abc,test1(289) abc,test2 (140), abc test3 (90), abc test4(50) , abc test5 (15)
top 5 rfm for (abc,test1) -> test11(260),test12(10),test13(8),test14(6),test15(5) and so on
我的查询失败,因为结果真的很大。我正在阅读像甲骨文这样的oracle分析函数。有人可以帮助我使用分析函数修改此查询。任何其他方法也可以。
我指的是http://www.orafaq.com/node/55。但是无法为此获得正确的SQL查询。
任何帮助都会受到赞赏。我喜欢被困2天:(
答案 0 :(得分:1)
可能有理由不使用分析功能,但使用分析功能单独使用:
select am, rf, rfm, rownum_rf2, rownum_rfm
from
(
-- the 3nd level takes the subproduct ranks, and for each equally ranked
-- subproduct, it produces the product ranking
select am, rf, rfm, rownum_rfm,
row_number() over (partition by rownum_rfm order by rownum_rf) rownum_rf2
from
(
-- the 2nd level ranks (without ties) the products within
-- categories, and subproducts within products simultaneosly
select am, rf, rfm,
row_number() over (partition by am order by count_rf desc) rownum_rf,
row_number() over (partition by am, rf order by count_rfm desc) rownum_rfm
from
(
-- inner most query counts the records by subproduct
-- using regular group-by. at the same time, it uses
-- the analytical sum() over to get the counts by product
select tg.am, ttc.rf, ttc.rfm,
count(*) count_rfm,
sum(count(*)) over (partition by tg.am, ttc.rf) count_rf
from tg inner join ttc on tg.value = ttc.value
group by tg.am, ttc.rf, ttc.rfm
) X
) Y
-- at level 3, we drop all but the top 5 subproducts per product
where rownum_rfm <= 5 -- top 5 subproducts
) Z
-- the filter on the final query retains only the top 10 products
where rownum_rf2 <= 10 -- top 10 products
order by am, rownum_rf2, rownum_rfm;
我使用的是rownum而不是rank,所以你永远不会得到关系,换句话说,关系会随机决定。如果数据不够密集(在前10个产品中的任何一个产品中少于5个副产品 - 它可能显示来自其他一些产品的副产品),这也不起作用。但是如果数据密集(大型建立数据库),查询应该可以正常工作。
<小时/> 下面两次传递数据,但在每种情况下返回正确的结果。同样,这是一个无排名的查询。
select am, rf, rfm, count_rf, count_rfm, rownum_rf, rownum_rfm
from
(
-- next join the top 10 products to the data again to get
-- the subproduct counts
select tg.am, tg.rf, ttc.rfm, tg.count_rf, tg.rownum_rf, count(*) count_rfm,
ROW_NUMBER() over (partition by tg.am, tg.rf order by 1 desc) rownum_rfm
from (
-- first rank all the products
select tg.am, tg.value, ttc.rf, count(*) count_rf,
ROW_NUMBER() over (order by 1 desc) rownum_rf
from tg
inner join ttc on tg.value = ttc.value
group by tg.am, tg.value, ttc.rf
order by count_rf desc
) tg
inner join ttc on tg.value = ttc.value and tg.rf = ttc.rf
-- filter the inner query for the top 10 products only
where rownum_rf <= 10
group by tg.am, tg.rf, ttc.rfm, tg.count_rf, tg.rownum_rf
) X
-- filter where the subproduct rank is in top 5
where rownum_rfm <= 5
order by am, rownum_rf, rownum_rfm;
列:
count_rf : count of sales by product
count_rfm : count of sales by subproduct
rownum_rf : product rank within category (rownumber - without ties)
rownum_rfm : subproduct rank within product (without ties)
答案 1 :(得分:0)
这是猜测,但你可能从这样的事情开始:
drop table category_sales;
一些测试数据:
create table category_sales (
category varchar2(14),
product varchar2(14),
subproduct varchar2(14),
sales number
);
begin
for cate in 1 .. 10 loop
for prod in 1 .. 20 loop
for subp in 1 .. 30 loop
insert into category_sales values (
'Cat ' || cate,
'Prod ' || cate||prod,
'Subp ' || cate||prod||subp,
trunc(dbms_random.value(1,30 + cate - prod + subp))
);
end loop; end loop; end loop;
end;
/
实际查询:
select * from (
select
category,
product,
subproduct,
sales,
category_sales,
product_sales,
top_subproduct,
-- Finding best products within category:
dense_rank () over (
partition by category
order by product_sales desc
) top_product
from (
select
-- Finding the best Subproducts within
-- category and product:
dense_rank () over (
partition by category,
product
order by sales desc
) top_subproduct,
-- Finding the sum(sales) within a
-- category and prodcut
sum(sales) over (
partition by category,
product
) product_sales,
-- Finding the sum(sales) within
-- category
sum(sales) over (
partition by category
) category_sales,
category,
product,
subproduct,
sales
from
category_sales
)
)
where
-- Only best 10 Products
top_product <= 10 and
-- Only best 5 subproducts:
top_subproduct <= 5
-- "Best" categories first:
order by
category_sales desc,
top_product desc,
top_subproduct desc;
在该查询中,列category_sales
返回返回其记录的类别的销售额。这意味着,同一类别的每条记录都具有相同的category_sales
。需要此列才能首先使用最佳(销售)类别(order by ... category_sales desc
)订购结果集。
同样,product_sales
是类别 - 产品组合的销售总额。此列用于在每个类别(where top_product <= 10
)中查找最佳 n(此处为:10)产品。
使用top_product
分析函数“创建”列dense_rank() over...
。对于类别中的最佳产品,它是1,其次是2,依此类推(因此where top_product <= 10
。
列top_suproduct
的计算方式与top_product
类似(即dense_rank
)。