连接表中的count(distinct)返回重复/不正确的值

时间:2015-12-01 15:05:44

标签: sql database oracle count distinct

SQL:

SELECT COUNT(DISTINCT person.p_id) AS numberOfPeople, 
location.l_id AS location
FROM job
INNER JOIN person ON job.j_person = person.p_id
INNER JOIN (location INNER JOIN area ON location.l_area = area.a_id) ON job.j_location = location.l_id
GROUP BY area.a_name, location.l_name

数据库:该作业'表格与“人物”有关联。 (在j_person = p_id上)和' location' (在j_location = l_id上)

Table: person (list of all people in the company, PK = p_id)
+------+--------+--
| p_id | p_name | etc.
+------+--------+--
|  01  |  John  | ...
+------+--------+--
|  02  |  Suzy  | ...
+------+--------+--
|  03  |  Mike  | ...
+------+--------+--
|  04  |  Kim   | ...
+------+--------+--


Table: job (list of all jobs, PK = j_id)
+------+----------+------------+--------+
| j_id | j_person | j_location | j_type |
+------+----------+------------+--------+
|  AB  |    02    |    cityB   | type2  |
+------+----------+------------+--------+
|  CD  |    02    |    cityA   | type3  |
+------+----------+------------+--------+
|  EF  |    01    |    cityC   | type2  |
+------+----------+------------+--------+
|  GH  |    03    |    cityB   | type1  |
+------+----------+------------+--------+
|  IJ  |    04    |    cityA   | type1  |
+------+----------+------------+--------+
|  KL  |    04    |    cityA   | type2  |
+------+----------+------------+--------+


Table: location (list of all locations, PK = l_id)
+-------+----------+--------+
| l_id  |  l_name  | l_area |
+-------+----------+----
| cityA | London   |   ...
+-------+----------+----
| cityB | New York |   ...
+-------+----------+----
| cityC | Brussels |   ...
+-------+----------+----

我需要什么:

每个城市的人员列表,以下是此SQL语句的结果:

  • 区域1:
    • 伦敦:2
    • 纽约:2
  • 区域2:
    • 布鲁塞尔:1

但......现在问题

结果无法显示任何重复的数字/人。 例如:Suzy(p_id = 02)在伦敦和纽约都有工作,但最后数字是正确的,她可能只计入这2个城市中的1个。

我认为我正在寻找一些可以消除已经包含/计算过的结果的解决方案,这样他们就无法在另一个/下一个城市重新计算。 在计算每个城市的人数总和时,该结果必须与表格中的记录总数相同。

例如,这不是问题。 Suzy不会被包括在让我们说纽约,因为地点/城市是更大区域的一部分。一个人总是只在一个区域内工作。

我在尝试解释我想要达到的目标方面遇到了一些麻烦,而且还没有英国本土人,所以如果有些事情不够清楚,请告诉我。

2 个答案:

答案 0 :(得分:1)

要执行此操作,首先必须在执行分组之前将每个人的作业数限制为1。这是一种方法:

with person as (select 1 p_id, 'John' p_name from dual union all
                select 2 p_id, 'Suzy' p_name from dual union all
                select 3 p_id, 'Mike' p_name from dual union all
                select 4 p_id, 'Kim' p_name from dual),
       jobs as (select 'AB' j_id, 2 j_person, 'cityB' j_location, 'type2' j_type from dual union all
                select 'CD' j_id, 2 j_person, 'cityA' j_location, 'type3' j_type from dual union all
                select 'EF' j_id, 1 j_person, 'cityC' j_location, 'type2' j_type from dual union all
                select 'GH' j_id, 3 j_person, 'cityB' j_location, 'type1' j_type from dual union all
                select 'IJ' j_id, 4 j_person, 'cityA' j_location, 'type1' j_type from dual union all
                select 'KL' j_id, 4 j_person, 'cityA' j_location, 'type2' j_type from dual),
   location as (select 'cityA' l_id, 'London' l_name from dual union all
                select 'cityB' l_id, 'New York' l_name from dual union all
                select 'cityC' l_id, 'Brussels' l_name from dual)
-- end of setting up some subqueries to mimic your tables with data in them. See SQL below:
select   location_name,
         count(distinct person_id) number_of_people
from     (select p.p_id person_id,
                 p.p_name person_name,
                 l.l_name location_name,
                 j.j_type job_type,
                 row_number() over (partition by p.p_id order by j.j_type, l.l_name) rn
          from   jobs j
                 inner join person p on j.j_person = p.p_id
                 inner join location l on j.j_location = l.l_id)
where    rn = 1
group by location_name;

LOCATION_NAME NUMBER_OF_PEOPLE
------------- ----------------
London                       1
Brussels                     1
New York                     2

您可以看到我使用row_number()分析函数按照作业类型和位置名称的顺序为每个p_id的行分配一个数字。如果决定在row_number = 1的行中列出哪个位置的逻辑与之不同,则需要适当修改排序子句。

从那里开始,只需过滤结果,只显示每个p_id的第一行,然后将结果分组以获得不同的人数。

答案 1 :(得分:1)

哦,报告的乐趣 - 在每个城市都有不完全正确的数字让他们排成一列代表我们的员工数量?或者让城市正确,但是总计它们会产生比我们的工资单更大的数字?因为实际上,在这种情况下,订单项和总计数会计算不同的东西,因为“在这个办公室工作的人”与“在公司工作的人”不同

另一个选择 - 分数人!

如果一个人在两个城市工作,请在“在此工作的人数”下显示这两个城市,同时总结一个修改符,从总数中扣除,以获得您的员工总数。

e.g。)

with person as (select 1 p_id, 'John' p_name from dual union all
                select 2 p_id, 'Suzy' p_name from dual union all
                select 3 p_id, 'Mike' p_name from dual union all
                select 4 p_id, 'Kim' p_name from dual),
       jobs as (select 'AB' j_id, 2 j_person, 'cityB' j_location, 'type2' j_type from dual union all
                select 'CD' j_id, 2 j_person, 'cityA' j_location, 'type3' j_type from dual union all
                select 'EF' j_id, 1 j_person, 'cityC' j_location, 'type2' j_type from dual union all
                select 'GH' j_id, 3 j_person, 'cityB' j_location, 'type1' j_type from dual union all
                select 'IJ' j_id, 4 j_person, 'cityA' j_location, 'type1' j_type from dual union all
                select 'KL' j_id, 4 j_person, 'cityA' j_location, 'type2' j_type from dual),
     lctn   as (select 'cityA' l_id, 'London' l_name from dual union all
                select 'cityB' l_id, 'New York' l_name from dual union all
                select 'cityC' l_id, 'Brussels' l_name from dual)
-- end of setting up some subqueries to mimic your tables with data in them. See SQL below:
select   location_name,
         location_jobs             number_of_distinct_jobs,
         count(distinct person_id) cnt_of_people_working_here,
         sum(distinct case when person_jobs = 1 then 0 else (1-person_jobs) end) shared_people
  FROM(  select p.p_id person_id,
                 l.l_name location_name,
                 1/(count(distinct l_name) over (partition by p.p_id)) person_jobs, 
                 count(distinct j_id)   over (partition by l_name) location_jobs 
          from   jobs j
                 inner join person p on j.j_person = p.p_id
                 inner join lctn l on j.j_location = l.l_id)
group by location_name, location_jobs;                 



LOCATION_NAME   NUMBER_OF_DISTINCT_JOBS   CNT_OF_PEOPLE_WORKING_HERE  SHARED_PEOPLE                          
"London"        3                         2                           0.5                                    
"Brussels"      1                         1                           0                                      
"New York"      2                         2                           0.5                                    

说到总行数,如果总结count_of_people_working_here并减去shared_people的总和,就会获得总工资单。其他任何事情以及你的行或你的总数都是关闭的,因为如上所述,你是在不同级别进行分组。