找到每对Col1和Col2最常出现的Col3

时间:2015-09-22 22:27:36

标签: sql database postgresql greatest-n-per-group amazon-redshift

给定一个包含4列的表格myTable,例如Col1Col2Col3Col4

A X 5 B
A Y 5 C
A X 7 D
A Y 3 E 
A X 7 F

我需要为每对col3找到大多数(col1, col2)

这个例子的结果将是:

A X 7   D/F  -- D or F
A Y 5/3 C/E  -- It can be 5 and C or 3 and E

所以我写了一个像这样的查询:

select Col1,Col2,Col3 
from myTable M 
group by Col1,Col2,Col3 
having Col3 =
     (select Col3 
      from myTable N 
      where M.Col1=N.col1 
      group by Col3 
      order by Col3 desc limit 1); 

但查询没有给出所需的结果 此外,我不知道如何将Col4作为group by子句,我不想​​根据Col4制作组。

对于每个(Col1, Col2)对,我希望单个Col4与最大值Col3匹配。

2 个答案:

答案 0 :(得分:1)

您只需要一个包含 DISTINCT ON 的子查询:

SELECT DISTINCT ON (col1, col2)
       col1, col2, col3, min(col4) As col4
FROM   tbl
GROUP  BY col1, col2, col3
ORDER  BY col1, col2, count(*) DESC, col3;

通过 最常见的 {{{}> {} {{{{ {{} 1}}(最严重的值,如果多次为"最常见") 最小 (col1, col2)与{{} 1}}。

同样,要获得所有符合条件的col3,您可以在子查询中使用window function rank(),在聚合之后也会执行:< / p>

col4

这样可行,因为您可以在聚合函数上运行窗口函数 如果数据类型不是character type,则转换为col3

或者,列表中每个col3的所有符合条件SELECT col1, col2, col3, col4_list FROM ( SELECT col1, col2, col3, count(*) AS ct, string_agg(col4, '/') AS col4_list , rank() OVER (PARTITION BY col1, col2 ORDER BY count(*) DESC) AS rnk FROM tbl GROUP BY col1, col2, col3 ) sub WHERE rnk = 1 ORDER BY col1, col2, col3; ,以及第二个列表中的所有匹配text

col3

相关答案以及更多解释:

Amazon Redshift的解决方案

row_number()可用,所以这应该有效:

(col1, col2)

或者如果不允许窗口函数超过聚合函数,请使用另一个子查询

col4

如果最大计数超过一个平局,则选择最小的SELECT col1, col2 , string_agg(col3::text, '/') AS col3_list -- cast if necessary , string_agg(col4_list, '/') AS col4_list FROM ( SELECT col1, col2, col3, count(*) AS ct, string_agg(col4, '/') AS col4_list , rank() OVER (PARTITION BY col1, col2 ORDER BY count(*) DESC) AS rnk FROM tbl GROUP BY col1, col2, col3 ) sub WHERE rnk = 1 GROUP BY col1, col2 ORDER BY col1, col2, col3_list; 。相应SELECT col1, col2, col3, col4 FROM ( SELECT col1, col2, col3, min(col4) AS col4 , row_number() OVER (PARTITION BY col1, col2 ORDER BY count(*) DESC, col3) AS rn FROM tbl GROUP BY col1, col2, col3 ) sub WHERE rn = 1 ORDER BY col1, col2; 的最小SELECT col1, col2, col3, col4 FROM ( SELECT *, row_number() OVER (PARTITION BY col1, col2 ORDER BY ct DESC, col3) AS rn FROM ( SELECT col1, col2, col3, min(col4) AS col4, COUNT(*) AS ct FROM tbl GROUP BY col1, col2, col3 ) sub1 ) sub2 WHERE rn = 1;

SQL Fiddle在Postgres 9.3中展示了所有内容。

答案 1 :(得分:0)

这样做的一种方法是在聚合查询之上使用SELECT col1, col2, col3 FROM (SELECT col1, col2, col3, ROW_NUMBER () OVER (PARTITION BY col1, col2 ORDER BY cnt DESC) AS rn FROM (SELECT col1, col2, col3, COUNT(*) AS cnt FROM mytable GROUP BY col1, col2, col3) t ) q WHERE rn = 1 窗口函数:

  <entry>
<record>28</record>
<time>2015/09/22 22:18:17.610</time>
<type>Error</type>
<source>VisualStudio</source>
<description>Loading UI library</description>
<guid>{2EF1EC52-C8BF-4FE0-8ECE-BA9C0D5D1603}</guid>
<hr>800a006f</hr>
<errorinfo>Cannot find the requested resource: 'VSMenus.ctmenu'.</errorinfo>