我有一张包含userIds
和产品类别prod
的表格。我想获得一个独特的userIds
表和相关的大多数产品类别prod
。换句话说,我想知道每个客户购买最多的项目类别。如何在PL / SQL或Oracle SQL中实现这一目标?
|userId|prod|
|------|----|
|123544|cars|
|123544|cars|
|123544|dogs|
|123544|cats|
|987689|bats|
|987689|cats|
我已经看到了用于获取列的最常见值的SO问题,但是如何为每个唯一userId
获取最常见的值?
答案 0 :(得分:3)
你应该只使用SQL来解决这个问题..如果你真的需要它在pl / sql中,只需在plsql中嵌入这个查询..
<强>(设定)强>
drop table yourtable;
create table yourtable (
userID number,
prod varchar2(10)
)
/
insert into yourtable values ( 123544, 'cars' );
insert into yourtable values ( 123544, 'cars' );
insert into yourtable values ( 123544, 'dogs' );
insert into yourtable values ( 123544, 'cats' );
insert into yourtable values ( 987689, 'bats' );
insert into yourtable values ( 987689, 'cats' );
commit;
- 假设关系没有被打破,这个逻辑会返回两个关系
with w_grp as (
select userID, prod, count(*) over ( partition by userID, prod ) rgrp
from yourtable
),
w_rnk as (
select userID, prod, rgrp,
rank() over (partition by userID order by rgrp desc) rnk,
from w_grp
)
select distinct userID, prod
from w_rnk
where rnk = 1
/
USERID PROD
---------- ----------
987689 bats
987689 cats
123544 cars
- 假设你只想要1 ..如果它们被绑定,这将返回1个随机的。(即这次它拉下了987689只蝙蝠,下次它可能会拉出987689只猫。它总会然而,返回123544辆汽车,因为那辆汽车没有领带。
with w_grp as (
select userID, prod, count(*) over ( partition by userID, prod ) rgrp
from yourtable
),
w_rnk as (
select userID, prod, rgrp,
row_number() over (partition by userID order by rgrp desc) rnum
from w_grp
)
select userID, prod, rnum
from w_rnk
where rnum = 1
/
USERID PROD RNUM
---------- ---------- ----------
123544 cars 1
987689 bats 1
[edit]从函数中清除未使用的rank / row_number以避免混淆[/ edit]
答案 1 :(得分:2)
SELECT user_id, prod, prod_cnt FROM (
SELECT user_id, prod, prod_cnt
, RANK() OVER ( PARTITION BY user_id ORDER BY prod_cnt DESC ) AS rn
FROM (
SELECT user_id, prod, COUNT(*) AS prod_cnt
FROM mytable
GROUP BY user_id, prod
)
) WHERE rn = 1;
在最里面的子查询中,我按用户获取每个产品的COUNT
。然后我使用分析(窗口)函数RANK()
对它们进行排名。然后,我只需选择RANK
等于1的所有内容。使用RANK()
代替ROW_NUMBER()
可确保返回关联。