获取每个用户ID最常出现的值

时间:2015-03-09 13:03:20

标签: sql oracle plsql

我有一张包含userIds和产品类别prod的表格。我想获得一个独特的userIds表和相关的大多数产品类别prod。换句话说,我想知道每个客户购买最多的项目类别。如何在PL / SQL或Oracle SQL中实现这一目标?

|userId|prod|
|------|----|
|123544|cars|
|123544|cars|
|123544|dogs|
|123544|cats|
|987689|bats|
|987689|cats|

我已经看到了用于获取列的最常见值的SO问题,但是如何为每个唯一userId获取最常见的值?

2 个答案:

答案 0 :(得分:3)

你应该只使用SQL来解决这个问题..如果你真的需要它在pl / sql中,只需在plsql中嵌入这个查询..

<强>(设定)

  drop table yourtable;
  create table yourtable (
     userID   number,
     prod     varchar2(10)
     )
  /

  insert into yourtable values ( 123544, 'cars' );
  insert into yourtable values ( 123544, 'cars' );
  insert into yourtable values ( 123544, 'dogs' );
  insert into yourtable values ( 123544, 'cats' );
  insert into yourtable values ( 987689, 'bats' );
  insert into yourtable values ( 987689, 'cats' );

  commit;

- 假设关系没有被打破,这个逻辑会返回两个关系

  with w_grp as (
        select userID, prod, count(*) over ( partition by userID, prod ) rgrp
          from yourtable
        ),
     w_rnk as (
        select userID, prod, rgrp,
               rank() over (partition by userID order by rgrp desc) rnk,
          from w_grp
        )
  select distinct userID, prod
    from w_rnk
   where rnk = 1
  /

      USERID PROD
  ---------- ----------
      987689 bats
      987689 cats
      123544 cars

- 假设你只想要1 ..如果它们被绑定,这将返回1个随机的。(即这次它拉下了987689只蝙蝠,下次它可能会拉出987689只猫。它总会然而,返回123544辆汽车,因为那辆汽车没有领带。

  with w_grp as (
        select userID, prod, count(*) over ( partition by userID, prod ) rgrp
          from yourtable
        ),
     w_rnk as (
        select userID, prod, rgrp,
               row_number() over (partition by userID order by rgrp desc) rnum
          from w_grp
        )
  select userID, prod, rnum
    from w_rnk
   where rnum = 1
  /

      USERID PROD             RNUM
  ---------- ---------- ----------
      123544 cars                1
      987689 bats                1

[edit]从函数中清除未使用的rank / row_number以避免混淆[/ edit]

答案 1 :(得分:2)

SELECT user_id, prod, prod_cnt FROM (
    SELECT user_id, prod, prod_cnt
         , RANK() OVER ( PARTITION BY user_id ORDER BY prod_cnt DESC ) AS rn
      FROM (
        SELECT user_id, prod, COUNT(*) AS prod_cnt
          FROM mytable
         GROUP BY user_id, prod
    )
) WHERE rn = 1;

在最里面的子查询中,我按用户获取每个产品的COUNT。然后我使用分析(窗口)函数RANK()对它们进行排名。然后,我只需选择RANK等于1的所有内容。使用RANK()代替ROW_NUMBER()可确保返回关联。