SQL左连接给出重复的结果

时间:2018-06-18 10:33:36

标签: sql sqlite

我有一个架构,我有图像,我也有这些图像的结果。结果存在于N个表中,具有不同的模式。我需要编写一个搜索查询,它可以返回所有符合某些条件(包括限制和偏移)的图像及其结果。

图像可能有10个结果(2个分类,8个检测)。我希望限制对图像起作用,而不是结果。所以,对于1张图片,我希望能够获得10行。

这是我到目前为止所拥有的。它的问题是结果行的重复和组合。即我希望每个结果都有一行,而不是那样的检测和分类。我需要UNION ALL吗?

CREATE TABLE images (
  id         VARCHAR(40)     NOT NULL,
  PRIMARY KEY (id)
);

CREATE TABLE image_results_classification (
  image_id    VARCHAR(40)          NOT NULL,
  c_confidence  REAL                 NOT NULL,
  FOREIGN KEY (image_id)  REFERENCES images(id)
);

CREATE TABLE image_results_detection (
  image_id    VARCHAR(40)          NOT NULL,
  d_confidence  REAL                 NOT NULL,
  FOREIGN KEY (image_id)  REFERENCES images(id)
);

INSERT INTO images (id) VALUES ('123');
INSERT INTO images (id) VALUES ('456');

INSERT INTO image_results_classification (image_id, c_confidence) VALUES ('123', 0.9);
INSERT INTO image_results_classification (image_id, c_confidence) VALUES ('123', 0.8);
INSERT INTO image_results_classification (image_id, c_confidence) VALUES ('456', 0.7);

INSERT INTO image_results_detection (image_id, d_confidence) VALUES ('123', 0.1);
INSERT INTO image_results_detection (image_id, d_confidence) VALUES ('123', 0.2);
INSERT INTO image_results_detection (image_id, d_confidence) VALUES ('456', 0.3);

这个模式是为这个问题设计的,以帮助简化:两个结果表上还有更多的行,它们也有所不同(不仅仅是信心)。

我想在我的应用层中最终得到的是类型:     Map [Image,(List [ClassificationResult],List [DetectionResult])]

即。图像,以及所有结果。 带有空值的结果集会很好。也许是这样的?:

id   c_confidence d_confidence
123  0.9          NULL
123  0.8          NULL
123  NULL         0.1
123  NULL         0.2
456  0.7          NULL
456  NULL         0.3

这是来自DB Fiddle的查询:

SELECT *
FROM images INNER JOIN
     (SELECT id FROM images LIMIT 10 OFFSET 0
     ) AS i
     ON (images.id = i.id) OUTER LEFT JOIN 
     image_results_classification c
     ON (images.id = c.image_id) OUTER LEFT JOIN 
     image_results_detection d
     ON (images.id = d.image_id);

https://www.db-fiddle.com/f/tuDxwY7kQGfEvZSzaajESG/0

编辑:我需要对结果进行过滤,并能够对图像进行限制和偏移,这是次要要求。

我希望能够执行以下查询:

  

给我所有图片及其所有结果,这些图片及其c_confidence>   0.5。即如果图像的c_confidence为0.4,则应该包含该图像(并且没有任何结果)。如果它有c_confidence   0.6,然后返回所有结果(包括image_results_detection)。

我已经更新了我的小提琴以反映这一点: https://www.db-fiddle.com/f/tuDxwY7kQGfEvZSzaajESG/1

在小提琴中,我想要没有结果回来,因为图像没有带有置信度的image_results_classification> 0.8

2 个答案:

答案 0 :(得分:2)

您可以将GROUP_CONCAT与GROUP BY一起使用 第一个group_concat可以在带有LIMIT的子查询中完成 为了避免这两个一对多关系之间的笛卡尔连接效应。

例如:

SELECT 
 q.*,  
 group_concat(d.d_confidence) as d_confidence_list
FROM
(
    SELECT i.id, group_concat(c.c_confidence) as c_confidence_list
    FROM images i
    LEFT JOIN image_results_classification c ON (c.image_id = i.id)
    GROUP BY i.id
    LIMIT 10
) q
LEFT JOIN image_results_detection d ON (d.image_id = q.id)
GROUP BY q.id, q.c_confidence_list

或者您可以按值使用DISTINCT并在没有子查询的情况下执行此操作

SELECT 
 i.id, 
 group_concat(distinct c.c_confidence) as c_confidence_list,
 group_concat(distinct d.d_confidence) as d_confidence_list
FROM images i
LEFT JOIN image_results_classification c ON (c.image_id = i.id)
LEFT JOIN image_results_detection d ON (d.image_id = i.id)
GROUP BY i.id
LIMIT 10

但如果对这些连接表有很大的信心,第一种方法可能会更快。

<强>附加

此处还有2个问题需要尝试。

第一个应该得到预期的结果 使用CTE,LIMIT只能完成一次。

with TOPIMG as (
  select * from images LIMIT 10
)
select image_id, c_confidence, null as d_confidence
from TOPIMG i
join image_results_classification c on c.image_id = i.id
union all
select image_id, null as c_confidence, d_confidence
from TOPIMG i
join image_results_detection d on d.image_id = i.id
order by image_id;

此查询使用一种技巧以迂回方式模仿带有PARTITION的ROW_NUMBER函数。 (我不喜欢它,它会杀死表演)

with TOPIMG as (
  select * from images LIMIT 10
)
select 
image_id, 
max(case when src = 'c' then conf end) as c_conf,
max(case when src = 'd' then conf end) as d_conf
from 
(
  select image_id, 'c' as src, c_confidence as conf,
  (
    select count(*) 
    from image_results_classification c2 
    where c.image_id = c2.image_id and c.c_confidence >= c2.c_confidence
  ) as RN
  from TOPIMG i
  join image_results_classification c on (c.image_id = i.id)

  union all

  select image_id, 'd', d_confidence,
  (
    select count(*) 
    from image_results_detection d2 
    where d.image_id = d2.image_id and d.d_confidence >= d2.d_confidence
  ) as RN
  from TOPIMG i
  join image_results_detection d on (d.image_id = i.id)
) cd
group by image_id, RN
order by image_id, RN;

<强>更新

实施特殊酱c_confidence > 0.5要求:

with IMG as (
  select i.id as image_id, 
  max(case when c.image_id is not null then 1 else 0 end) as show_all
  from images i
  left join image_results_classification c on (c.image_id = i.id and c.c_confidence > 0.5)
  group by i.id
  order by i.id
  LIMIT 100
)
select c.image_id, 'c' as result_type, c.c_confidence as confidence
from IMG i
join image_results_classification c on c.image_id = i.image_id
where i.show_all = 1

union all

select d.image_id, 'd' as result_type, d.d_confidence as confidence
from IMG i
join image_results_detection d on d.image_id = i.image_id
where i.show_all = 1

union all

select i.image_id, null, null
from IMG i
where i.show_all = 0

order by image_id;

答案 1 :(得分:2)

您将每个分类与每个检测相结合。但这两者并没有真正相关,所以不要这样做。一种解决方案是分别选择分类和检测,并union all

select *
from
(
  select 'Classification' as what, image_id, c_confidence as value
  from image_results_classification
  union all
  select 'Detection' as what, image_id, d_confidence as value
  from image_results_detection
) results
where image_id in
(
  select id
  from images
  -- order by something to decide which images to pick?
  limit 10
);

输出:

+ ---------------+----------+-------+
| what           | image_id | value |
+ ---------------+----------+-------+
| Classification | 123      | 0.8   |
| Classification | 123      | 0.9   |
| Detection      | 123      | 0.1   |
| Detection      | 123      | 0.2   |
| Classification | 456      | 0.7   |
| Detection      | 456      | 0.3   |
+ ---------------+----------+-------+

DB-fiddle demo:https://www.db-fiddle.com/f/fZPMNL7NC8GzwkwHc4strG/0