独特的聚合:需要帮助才能编写查询

时间:2019-04-01 19:37:45

标签: mysql sql

问题已被编辑,因为SUM(DISTINCT(num_profiles))不是解决方案!

假设我有一个表格(ExampleData):

+----------+---------------+-----------+------+--------------+------------+
| date     | function_name | file_name | self | num_profiles | profile_id |
+----------+---------------+-----------+------+--------------+------------+
| 20190301 | function1     | file1.go  | 10   | 30           | 100        |
| 20190301 | function2     | file1.go  | 20   | 30           | 100        |
| 20190301 | function1     | file1.go  | 30   | 20           | 200        |
| 20190301 | function3     | file1.go  | 40   | 20           | 200        |
| 20190301 | function4     | file1.go  | 45   | 20           | 222        |
| 20190301 | function1     | file2.go  | 50   | 20           | 200        |
| 20190302 | function1     | file1.go  | 10   | 10           | 300        |
| 20190302 | function2     | file1.go  | 20   | 10           | 300        |
| 20190302 | function3     | file2.go  | 60   | 10           | 300        |
+----------+---------------+-----------+------+--------------+------------+

我需要按日期,文件名进行汇总并计算sum(self)和sum(num_profiles)。像这样:

SELECT
    date,
    file_name,
    SUMself) AS self,
    SUM(num_profiles) AS num_profiles
FROM ExampleData 
GROUP BY date, file_name 
ORDER BY date, file_name;

但是我需要修改sum(num_profiles) as num_profiles的逻辑。我只需要用不同的num_profile来总结profile_ids。 代替:

+----------+-----------+------+--------------+
| date     | file_name | self | num_profiles |
+----------+-----------+------+--------------+
| 20190301 | file1.go  | 145  | 120          |
| 20190301 | file2.go  | 50   | 20           |
| 20190302 | file1.go  | 30   | 20           |
| 20190302 | file2.go  | 60   | 10           |
+----------+-----------+------+--------------+

我需要得到以下结果:

+----------+-----------+------+--------------+
| date     | file_name | self | num_profiles |
+----------+-----------+------+--------------+
| 20190301 | file1.go  | 145  | 70           |
| 20190301 | file2.go  | 50   | 20           |
| 20190302 | file1.go  | 30   | 10           |
| 20190302 | file2.go  | 60   | 10           |
+----------+-----------+------+--------------+

第一行是聚合的结果:

+----------+---------------+-----------+------+--------------+------------+
| date     | function_name | file_name | self | num_profiles | profile_id |
+----------+---------------+-----------+------+--------------+------------+
| 20190301 | function1     | file1.go  | 10   | 30           | 100        |
| 20190301 | function2     | file1.go  | 20   | 30           | 100        |
| 20190301 | function1     | file1.go  | 30   | 20           | 200        |
| 20190301 | function4     | file1.go  | 45   | 20           | 222        |
| 20190301 | function3     | file1.go  | 40   | 20           | 200        |
+----------+---------------+-----------+------+--------------+------------+

self = sum(aggregated self)-这就是我需要的。 但是num_profiles应该是来自具有不同profile_ids的行的总和(30(profile_id = 100)+ 20(profile_id = 200)+20(profile_id = 222)= 70)。 像这样:

SELECT SUM(num_profiles)
FROM (
    SELECT ANY_VALUE(num_profiles) AS num_profiles
    FROM ExampleData 
    WHERE date='20190301' AND file_name='file1.go' 
    GROUP BY profile_id
);

此示例为第一行计算num_profiles。 在我的数据集中,num_profile的特定profile_id是相同的。

如何将这种逻辑结合到单个查询中?

5 个答案:

答案 0 :(得分:1)

这是一个很奇怪的请求(因此也很有趣)。我认为,要解决此问题,您需要在子查询中执行第一级聚合,将结果集连接在一起,然后进行第二次聚合。

考虑:

SELECT
  e1.date,
  e1.file_name,
  e1.sum_self as self,
  SUM(e2.num_profiles) as num_profiles
FROM 
    (
        SELECT date, file_name, SUM(self) as sum_self
        FROM ExampleData
        GROUP BY date, file_name
    ) e1
    INNER JOIN (
        SELECT DISTINCT date, file_name, num_profiles, profile_id FROM ExampleData
    ) e2 ON e2.date = e1.date AND e2.file_name = e1.file_name
GROUP BY e1.date, e1.file_name, e1.sum_self
ORDER BY e1.date, e1.file_name;

this DB Fiddle 中包含您的示例数据,该查询返回:

| date       | file_name | self | num_profiles |
| ---------- | --------- | ---- | ------------ |
| 2019-03-01 | file1.go  | 100  | 50           |
| 2019-03-01 | file2.go  | 50   | 20           |
| 2019-03-02 | file1.go  | 30   | 10           |
| 2019-03-02 | file2.go  | 60   | 10           |

答案 1 :(得分:0)

您可以使用sum(不同列):

SELECT
  date,
  file_name,
  sum(self) as self,
  sum(distinct num_profiles) as num_profiles
FROM ExampleData 
GROUP BY date, file_name 
ORDER BY date, file_name

在澄清了profile_id要求和更好的日期之后,最简单的查询方法是:

select e.date,
  e.file_name,
  sum(e.self) as self,
  sum(e.num_profiles) as num_profiles
from (
  select date, file_name, profile_id, 
     sum(self) as self, sum(distinct num_profiles) as num_profiles
  from ExampleData
  group by date, file_name, profile_id
) as e
group by e.date, e.file_name

请参见SQLFiddle

答案 2 :(得分:0)

我不确定您为什么要这样做,但是您可以使用SUM(DISTINCT)

SELECT
  date,
  file_name,
  sum(self) as self,
  sum(DISTINCT num_profiles) as num_profiles
FROM ExampleData GROUP BY date, file_name ORDER BY date, file_name;

通常,我们将DISTINCT与COUNT一起使用(以计算不同值的数量),但它也适用于SUM。

答案 3 :(得分:0)

这是您想要的吗?

您可以使用以下摘要总结在特定日期具有多个不同配置文件ID的文件的不同配置文件

SELECT
date,
file_name,
sum(self) as self,     
sum(distinct num_profiles) 
 as 
 num_profiles
FROM ExampleData GROUP BY
 date,file_name Order By
 date,file_name
 Having count(distinct 
  profile_id) >1

答案 4 :(得分:0)

另一个变体:

SELECT e1.date, e1.file_name, SUM(e1.self) as self, SUM(e1.num_profiles) as num_profiles FROM
(
  SELECT date, file_name, SUM(self) as self, ANY_VALUE(num_profiles)as num_profiles, profile_id FROM ExampleData
  GROUP BY date, file_name, profile_id
) e1 GROUP BY e1.date, e1.file_name;