Question

我有一个问题：

select substr(name,7,50) as location, points,sum(if (p1=r1,10,-10))as total from
dq.data 
group by points,location order by location,total desc

产生这些数据：

FRANCE  |0|2|0|0|0|0|1  110.0    
FRANCE  |0|2|1|0|1|2|1  100.0    
FRANCE  |0|2|0|0|0|1|1  100.0    
FRANCE  |0|2|1|0|0|1|1  100.0    
FRANCE  |0|2|0|1|1|2|1  100.0    
FRANCE  |0|2|0|0|1|1|1  100.0
GERMANY |1|0|2|2|2|1|0  120.0    
GERMANY |1|0|2|2|2|0|0  110.0    
GERMANY |1|0|2|2|2|2|0  110.0    
GERMANY |1|0|2|2|2|0|2  110.0    
GERMANY |1|0|2|2|2|1|1  110.0

我希望每个total获得最高points和相关location。

我最终应该：

FRANCE  |0|2|0|0|0|0|1  110.0
GERMANY |1|0|2|2|2|1|0  120.0

我相信我需要使用子查询和MAX(total)，但我无法使用它。在子查询中，我想选择points，但我不想将它分组，这显然是不允许的。

我该怎么做？

Answer 1

你的直觉是正确的。您可以通过计算最大总数然后将其连接回原始数据来执行此操作：

select t.*
from (select substr(name,7,50) as location, points,sum(if (p1=r1,10,-10))as total
      from dq.data 
      group by points,location
     ) t join
     (select location, max(total) as maxtotal
      from (select substr(name,7,50) as location, points,sum(if (p1=r1,10,-10))as total
            from dq.data 
            group by points,location
           ) t
      group by location
     ) tsum
     on t.location = tsum.location and t.total = tsum.maxtotal

请注意，如果顶部有联系，此版本将返回重复项。

我对google-biggquery并不熟悉。如果它支持“with”语句，那么您可以通过执行以下操作来简化查询：

with t as (select substr(name,7,50) as location, points,sum(if (p1=r1,10,-10))as total
           from dq.data 
           group by points,location
          )
select t.*
from t join
     (select location, max(total) as maxtotal
      from t
      group by location
     ) tsum
     on t.location = tsum.location and t.total = tsum.maxtotal

如果它支持windows函数（例如row_number（）），那么你可以完全消除显式连接。

Answer 2

我最近遇到了类似的问题，解决了类似的问题：

SELECT substr(name,7,50) as location, points,sum(if (p1=r1,10,-10))as total
FROM ( 
   SELECT * FROM dq.data ORDER BY location,sum(if (p1=r1,10,-10)) desc 
) tmp
GROUP BY points,location;

不确定它是否可以正常运行，因为我的数据库是MySQL，但它是一个很好的直观解决方案。按照您希望汇总行丢失的方式对子查询进行排序。

从按查询分组中选择最高值

2 个答案: