如何将列的大多数重复值分配给另一列的每个唯一值?

时间:2017-12-01 15:07:47

标签: sql hiveql

此表是聚合表,count表示前四列数据中的行数。我想为每个客户分配一个和一个组织ID(在数据“具有最高计数总和的组织中”具有最大出现次数的组织)。

customer name   organization id     item    city    count
Jan Tomas        3478               cloth    Rom     20
Jan Tomas        3478               cloth    Milan   12
Jan Tomas        3478               shoe     Munich  14
Jan Tomas        3478               shoe     Rom      5
Jan Tomas        653                cloth    Berlin  10
Jan Tomas        653                shoe     Brussels 5
Jan Tomas        123                cloth    Paris   12
Jan Tomas        123                cloth    Rom     14
Martin Muller    654                cloth    Rom     15
Martin Muller    654                cloth    Berlin  16
Martin Muller    654                shoe     Rom      7
Martin Muller    980                cloth    Milan   28
Martin Muller    980                shoe     Paris   19
Janatan Kery     765                cloth    Rom     20
Janatan Kery     765                cloth    Munich  11
Janatan Kery     765                shoe     Rom     22
Janatan Kery     476                cloth    Milan   2
Janatan Kery     476                cloth    Rom     24

我想输出如下。任何你的帮助将非常感激。这只是数据样本。我有超过200万的独特客户。

customer name     organization id    item    city    count
Jan Tomas          3478              cloth    Rom     20
Jan Tomas          3478              cloth    Milan   12
Jan Tomas          3478              shoe     Munich  14
Jan Tomas          3478              shoe     Rom      5
Martin Muller      980               cloth    Milan   28
Martin Muller      980               shoe     Paris   19
Janatan Kery       765               cloth    Rom     20
Janatan Kery       765               cloth    Munich  11
Janatan Kery       765               shoe     Rom     22

2 个答案:

答案 0 :(得分:1)

试试这个,它使用ROW_NUMBER()查找每个SUM(Count)最大的CustomerName/OrganizationID,然后连接到其余列的主表:

SELECT yt.*
FROM YourTable yt
JOIN (SELECT CustomerName, 
             OrganizationID, 
             ROW_NUMBER() OVER (PARTITION BY CUST ORDER BY SUM(Count) DESC) RN
      FROM YourTable
      GROUP BY CustomerName, OrganizationID) A ON A.RN = 1
                                              AND A.CustomerName = yt.CustomerName
                                              AND A.OrganizationID = yt.OrganizationID

答案 1 :(得分:0)

这就是我在MS SQL Server中的表现。也许它会引导你朝着正确的方向前进。

WITH    cte
      AS (SELECT    [customer name]
          ,         [organization id]
          ,         item
          ,         city
          ,         COUNT
          ,         ROW_NUMBER() OVER (PARTITION BY item, city ORDER BY COUNT DESC) AS RN
          FROM      TableName
         )
SELECT  [customer name]
,       [organization id]
,       item
,       city
,       COUNT
FROM    cte
WHERE   RN = 1