SQL用于计算一对多关系中不匹配的记录

时间:2017-03-15 21:27:29

标签: mysql

我有两个MySQL表:

调查(日期,地点,公里) 主键:日期+位置      (每项调查一项记录)

标本(日期,地点,物种)(每个调查日期和地点零个或多个记录)

我想找到调查次数和调查的公里数,其中标本表中没有特定物种的记录。换句话说,没有找到特定物种的调查数量。

调查总数为:

select count(date) as surveys, sum(kilometers) as KM_surveyed 
from surveys;

 +---------+-------------+
 | surveys | KM_surveyed |
 +---------+-------------+
 |   20141 |    40673.59 |
 +---------+-------------+

找到没有找到标本的调查数量很容易:

select count(s.date) as surveys, sum(s.kilometers) as KM_surveyed 
from surveys=s left join specimens=p 
on (s.date=p.date and s.location=p.location)
where p.date is null;

 +---------+-------------+
 | surveys | KM_surveyed |
 +---------+-------------+
 |    8820 |    15848.26 |
 +---------+-------------+

标本中的记录总数为:

select count(*) from specimens;

+-----------+
|  count(*) |
+-----------+
|     51566 |
+-----------+ 

所有调查中发现的海豹突击队(HASE)的正确数量为:

select count(*) from specimens where species = 'HASE';

 +-----------+
 | count(*)  |
 +-----------+
 |       662 |
 +-----------+

找到发现海豹突击队(HASE)的调查数量并不容易 由于样本表通常每次调查包含多个记录,因此该查询不返回调查数量,而是返回找到的HASE样本数量:

select count(s.date), sum(s.kilometers) 
from surveys=s 
left join specimens=p on (s.date=p.date and s.location=p.location) 
where p.species = 'HASE';

 +---------+-------------+
 | surveys | KM_surveyed |
 +---------+-------------+
 |     662 |     2030.70 |  WRONG! that is number of specimens not surveys 
 +---------+-------------+

找到没有找到海豹突击队(HASE)的调查数量也不容易。此查询不返回调查数量,而是返回未发现海豹突击物的样本数量:

select count(s.date), sum(s.kilometers) 
from surveys=s 
left join specimens=p on (s.date=p.date and s.location=p.location) 
where p.species <> 'HASE' or p.date is null;`

 +---------+-------------+
 | surveys | KM_surveyed |
 +---------+-------------+
 |   50904 |   151310.49 | 
 +---------+-------------+

错误! 50904 =非HASE标本的数量

如何构建查询以正确计算找到海豹突击队的调查次数以及未找到海关人员时的调查次数?

3 个答案:

答案 0 :(得分:1)

当您执行LEFT JOIN查找不匹配的行时,应将不应匹配的条件放入ON子句,而不是WHERE子句。

SELECT COUNT(*), SUM(s.kilometers)
FROM surveys AS s
LEFT JOIN specimens AS p ON s.date = p.date and s.location = p.location
    AND p.species = 'HASE'
WHERE p.date IS NULL

答案 1 :(得分:1)

您可以在WHERE子句中使用EXISTS / NOT EXISTS子查询。

HASE表中找到specimens的调查:

select count(*), sum(s.kilometers)
from surveys s
where exists (
    select *
    from specimens p
    where s.date=p.date
      and s.location=p.location
      and p.species = 'HASE'
)

HASE表中找不到specimens的调查:

select count(*), sum(s.kilometers)
from surveys s
where not exists (
    select *
    from specimens p
    where s.date=p.date
      and s.location=p.location
      and p.species = 'HASE'
)

第一个查询的替代方法可能是:

select count(*), sum(s.kilometers)
from (
    select distinct date, location
    from specimens
    where species = 'HASE'
) p
join surveys s using (date, location)

根据数据(如果'HASE'是罕见的“物种”),它可能会更快。

Barmar已经发布了第二个查询的最佳替代方案。

答案 2 :(得分:0)

为什么人们会如此努力地找到联接?

查找找到Harbor Seals(HASE)的调查数量:

select count(distinct concat(s.location, s.date))
from surveys s 
Inner join specimens p 
on (s.date=p.date and s.location=p.location) 
where p.species = 'HASE';

查找未找到海豹突击队(HASE)的调查数量,只是调查数量(您已经拥有)与上述数值之间的差异。由于两个查询都返回单行,因此查询的笛卡尔积将在单个输出行中给出一个值,但只是有点不同:

Select count(*), sum(kilometres)
From (
  Select kilometres
  From surveys s
  Left join specimens p 
  on (s.date=p.date and s.location=p.location) 
  and p.species = 'HASE'
  Where p.species is null
) As zero_surveys;

(还有其他几种方法可以编写上面的查询)