我将来自我的Tensorflow图像分类器的结果保存在SQL数据库中。我有3张桌子。图像,类别和将两者连接到权重变量的表格。有些图像没有关系,有些图像有很多。
问题是我在图像表中有重复的行需要删除。但是,如果复制的图像有一个或多个,我需要保留多对多的关系。
以下是一个例子:
表名: my_images
+----+------------+-----------------+
| ID | image_path | image_filename |
+----+------------+-----------------+
| 1 | Film 1 | Film 1 001.jpg |
| 2 | Film 1 | Film 1 001.jpg |
| 3 | Film 1 | Film 1 002.jpg |
| 4 | Film 1 | Film 1 002.jpg |
| 5 | Film 1 | Film 1 003.jpg |
| 6 | Film 1 | Film 1 003.jpg |
+----+------------+-----------------+
表名: my_terms
+---------+------------+
| term_id | term_name |
+---------+------------+
| 1 | cat |
| 2 | dog |
| 3 | automobile |
+---------+------------+
表名: my_term_relationships
+----------+---------+---------+
| image_id | term_id | weight |
+----------+---------+---------+
| 2 | 1 | 0.58516 |
| 2 | 3 | 0.16721 |
| 3 | 2 | 0.21475 |
+----------+---------+---------+
所以在这个例子中,理想的结果是从 my_images 中删除第1,4行和第5行或第6行。
答案 0 :(得分:1)
逐步接近此步骤。
首先,找到重复的条目:
SELECT
image_path, image_filename
FROM my_images
GROUP BY image_path, image_filename
HAVING COUNT(*) > 1
其次,获取所有重复的行:
SELECT mi.*
FROM my_images mi
JOIN (
SELECT
image_path, image_filename
FROM my_images
GROUP BY image_path, image_filename
HAVING COUNT(*) > 1
) dups ON mi.image_path = dups.image_path AND mi.image_filename = dups.image_filename
最后,获取不要删除的ID。
SELECT MIN(ID)
FROM my_images mi
JOIN (
SELECT
image_path, image_filename
FROM my_images
GROUP BY image_path, image_filename
HAVING COUNT(*) > 1
) dups ON mi.image_path = dups.image_path AND mi.image_filename = dups.image_filename
LEFT JOIN my_term_relationships mtr ON mi.ID = mtr.image_id
WHERE mtr.image_id IS NULL
GROUP BY mi.image_path, mi.image_filename
HAVING COUNT(*) > 0
检查一切是否正确。如果是,请将其转换为删除语句。
DELETE my_images.* FROM my_images
JOIN (
SELECT MIN(ID) AS ID
FROM my_images mi
JOIN (
SELECT
image_path, image_filename
FROM my_images
GROUP BY image_path, image_filename
HAVING COUNT(*) > 1
) dups ON mi.image_path = dups.image_path AND mi.image_filename = dups.image_filename
LEFT JOIN my_term_relationships mtr ON mi.ID = mtr.image_id
WHERE mtr.image_id IS NULL
GROUP BY mi.image_path, mi.image_filename
HAVING COUNT(*) > 0
) sq USING(ID);
编辑:为了解决Logar提到的问题,在DELETE语句之前使用此UPDATE语句。
UPDATE my_term_relationships mtr
JOIN (
SELECT mi.ID, minID
FROM my_images mi
JOIN (
SELECT
image_path, image_filename, MIN(ID) AS minID
FROM my_images
GROUP BY image_path, image_filename
HAVING COUNT(*) > 1
) dups ON mi.image_path = dups.image_path AND mi.image_filename = dups.image_filename
) sq ON mtr.image_id = sq.ID
SET mtr.image_id = sq.minID;
答案 1 :(得分:0)
您需要查询两组图像ID,并使用它们进行过滤。假设image_path
和image_filename
一起是唯一的:
my_images
ID,my_term_relationships
未引用,但可能会引用相应的image_path
+ image_filename
。image_path
+ image_filename
对,my_term_relationships
中完全没有引用。看看这个查询:
DELETE FROM my_images
WHERE
ID NOT IN (SELECT DISTINCT image_id FROM my_term_relationships) -- 1
AND
ID NOT IN (SELECT id FROM (
SELECT MIN(ID) as id
FROM my_images
LEFT JOIN my_term_relationships ON ID = image_id
GROUP BY image_path,image_filename
HAVING COUNT(image_id) = 0
) as u_ids -- 2
);
请注意,您必须将my_images
表包装在子查询中的DELETE的WHERE子句中。阅读此主题以获得解释:Can't specify target table for update in FROM clause
示例:sqlfiddle
用于从my_term_relationships
删除重复行的更新查询示例:
UPDATE my_term_relationships
SET image_id = (
select min(my_images.ID)
from my_images
join my_images as ref_image on (my_images.image_path = ref_image.image_path and my_images.image_filename = ref_image.image_filename)
where ref_image.ID = image_id
);