问题已被编辑,因为SUM(DISTINCT(num_profiles))
不是解决方案!
假设我有一个表格(ExampleData):
+----------+---------------+-----------+------+--------------+------------+
| date | function_name | file_name | self | num_profiles | profile_id |
+----------+---------------+-----------+------+--------------+------------+
| 20190301 | function1 | file1.go | 10 | 30 | 100 |
| 20190301 | function2 | file1.go | 20 | 30 | 100 |
| 20190301 | function1 | file1.go | 30 | 20 | 200 |
| 20190301 | function3 | file1.go | 40 | 20 | 200 |
| 20190301 | function4 | file1.go | 45 | 20 | 222 |
| 20190301 | function1 | file2.go | 50 | 20 | 200 |
| 20190302 | function1 | file1.go | 10 | 10 | 300 |
| 20190302 | function2 | file1.go | 20 | 10 | 300 |
| 20190302 | function3 | file2.go | 60 | 10 | 300 |
+----------+---------------+-----------+------+--------------+------------+
我需要按日期,文件名进行汇总并计算sum(self)和sum(num_profiles)。像这样:
SELECT
date,
file_name,
SUMself) AS self,
SUM(num_profiles) AS num_profiles
FROM ExampleData
GROUP BY date, file_name
ORDER BY date, file_name;
但是我需要修改sum(num_profiles) as num_profiles
的逻辑。我只需要用不同的num_profile
来总结profile_ids
。
代替:
+----------+-----------+------+--------------+
| date | file_name | self | num_profiles |
+----------+-----------+------+--------------+
| 20190301 | file1.go | 145 | 120 |
| 20190301 | file2.go | 50 | 20 |
| 20190302 | file1.go | 30 | 20 |
| 20190302 | file2.go | 60 | 10 |
+----------+-----------+------+--------------+
我需要得到以下结果:
+----------+-----------+------+--------------+
| date | file_name | self | num_profiles |
+----------+-----------+------+--------------+
| 20190301 | file1.go | 145 | 70 |
| 20190301 | file2.go | 50 | 20 |
| 20190302 | file1.go | 30 | 10 |
| 20190302 | file2.go | 60 | 10 |
+----------+-----------+------+--------------+
第一行是聚合的结果:
+----------+---------------+-----------+------+--------------+------------+
| date | function_name | file_name | self | num_profiles | profile_id |
+----------+---------------+-----------+------+--------------+------------+
| 20190301 | function1 | file1.go | 10 | 30 | 100 |
| 20190301 | function2 | file1.go | 20 | 30 | 100 |
| 20190301 | function1 | file1.go | 30 | 20 | 200 |
| 20190301 | function4 | file1.go | 45 | 20 | 222 |
| 20190301 | function3 | file1.go | 40 | 20 | 200 |
+----------+---------------+-----------+------+--------------+------------+
self = sum(aggregated self)
-这就是我需要的。
但是num_profiles
应该是来自具有不同profile_ids
的行的总和(30(profile_id = 100)+ 20(profile_id = 200)+20(profile_id = 222)= 70)。
像这样:
SELECT SUM(num_profiles)
FROM (
SELECT ANY_VALUE(num_profiles) AS num_profiles
FROM ExampleData
WHERE date='20190301' AND file_name='file1.go'
GROUP BY profile_id
);
此示例为第一行计算num_profiles
。
在我的数据集中,num_profile
的特定profile_id
是相同的。
如何将这种逻辑结合到单个查询中?
答案 0 :(得分:1)
这是一个很奇怪的请求(因此也很有趣)。我认为,要解决此问题,您需要在子查询中执行第一级聚合,将结果集连接在一起,然后进行第二次聚合。
考虑:
SELECT
e1.date,
e1.file_name,
e1.sum_self as self,
SUM(e2.num_profiles) as num_profiles
FROM
(
SELECT date, file_name, SUM(self) as sum_self
FROM ExampleData
GROUP BY date, file_name
) e1
INNER JOIN (
SELECT DISTINCT date, file_name, num_profiles, profile_id FROM ExampleData
) e2 ON e2.date = e1.date AND e2.file_name = e1.file_name
GROUP BY e1.date, e1.file_name, e1.sum_self
ORDER BY e1.date, e1.file_name;
在 this DB Fiddle 中包含您的示例数据,该查询返回:
| date | file_name | self | num_profiles |
| ---------- | --------- | ---- | ------------ |
| 2019-03-01 | file1.go | 100 | 50 |
| 2019-03-01 | file2.go | 50 | 20 |
| 2019-03-02 | file1.go | 30 | 10 |
| 2019-03-02 | file2.go | 60 | 10 |
答案 1 :(得分:0)
您可以使用sum(不同列):
SELECT
date,
file_name,
sum(self) as self,
sum(distinct num_profiles) as num_profiles
FROM ExampleData
GROUP BY date, file_name
ORDER BY date, file_name
在澄清了profile_id要求和更好的日期之后,最简单的查询方法是:
select e.date,
e.file_name,
sum(e.self) as self,
sum(e.num_profiles) as num_profiles
from (
select date, file_name, profile_id,
sum(self) as self, sum(distinct num_profiles) as num_profiles
from ExampleData
group by date, file_name, profile_id
) as e
group by e.date, e.file_name
请参见SQLFiddle
答案 2 :(得分:0)
我不确定您为什么要这样做,但是您可以使用SUM(DISTINCT)
SELECT
date,
file_name,
sum(self) as self,
sum(DISTINCT num_profiles) as num_profiles
FROM ExampleData GROUP BY date, file_name ORDER BY date, file_name;
通常,我们将DISTINCT与COUNT一起使用(以计算不同值的数量),但它也适用于SUM。
答案 3 :(得分:0)
这是您想要的吗?
您可以使用以下摘要总结在特定日期具有多个不同配置文件ID的文件的不同配置文件
SELECT
date,
file_name,
sum(self) as self,
sum(distinct num_profiles)
as
num_profiles
FROM ExampleData GROUP BY
date,file_name Order By
date,file_name
Having count(distinct
profile_id) >1
答案 4 :(得分:0)
另一个变体:
SELECT e1.date, e1.file_name, SUM(e1.self) as self, SUM(e1.num_profiles) as num_profiles FROM
(
SELECT date, file_name, SUM(self) as self, ANY_VALUE(num_profiles)as num_profiles, profile_id FROM ExampleData
GROUP BY date, file_name, profile_id
) e1 GROUP BY e1.date, e1.file_name;