我正在寻找写一个查询来查找具有相似列的表。 原因是因为在我们的测试环境中,没有人经过很长时间才清理任何东西,这将有助于我们对它们进行排序并将表中具有相似列的表清除。
当前,数据库中有200个表,由于某些表名不同但字段相同,因此很难找到包含相似信息的表。
例如
表A如下:
ID名称描述
表B看起来像:
ID名称
表B可能包含或可能不包含相同的数据。
我的想法是以这种格式获取结果。
感谢阅读!
答案 0 :(得分:2)
尝试一下:
WITH PrepareData AS
(
SELECT c1.TABLE_CATALOG AS c1_Cat,c1.TABLE_SCHEMA AS c1_Sch,c1.TABLE_NAME AS c1_Nam
,c2.TABLE_CATALOG AS c2_Cat,c2.TABLE_SCHEMA AS c2_Sch,c2.TABLE_NAME AS c2_Nam
,COUNT(*) AS IdenticalColumns
,(SELECT COUNT(*) FROM INFORMATION_SCHEMA.COLUMNS x WHERE x.TABLE_CATALOG=c1.TABLE_CATALOG
AND x.TABLE_SCHEMA=c1.TABLE_SCHEMA
AND x.TABLE_NAME=c1.TABLE_NAME) AS CountT1
,(SELECT COUNT(*) FROM INFORMATION_SCHEMA.COLUMNS x WHERE x.TABLE_CATALOG=c2.TABLE_CATALOG
AND x.TABLE_SCHEMA=c2.TABLE_SCHEMA
AND x.TABLE_NAME=c2.TABLE_NAME) AS CountT2
FROM INFORMATION_SCHEMA.COLUMNS c1
FULL OUTER JOIN INFORMATION_SCHEMA.COLUMNS c2 ON c1.TABLE_NAME<>c2.TABLE_NAME
AND c1.COLUMN_NAME=c2.COLUMN_NAME
WHERE c1.TABLE_NAME IS NOT NULL AND c2.TABLE_NAME IS NOT NULL
GROUP BY c1.TABLE_CATALOG,c1.TABLE_SCHEMA,c1.TABLE_NAME
,c2.TABLE_CATALOG,c2.TABLE_SCHEMA,c2.TABLE_NAME
)
,ComputeSimilarity AS
(
SELECT *
,CAST(IdenticalColumns AS FLOAT) / CAST(CountT1 AS FLOAT) AS Factor1
,CAST(IdenticalColumns AS FLOAT) / CAST(CountT1 AS FLOAT) AS Factor2
FROM PrepareData
)
SELECT *
FROM ComputeSimilarity
WHERE Factor1>0.75 OR Factor2>0.75;
想法:将相同列名但不同表名上的所有列连接在一起。计算拟合列并将它们与列的总数关联起来。
提示:这也将返回VIEW。您可以加入INFORMATION_SCHEMA.TABLES
来找到类型。
提示2:您会分别获得两次结果,一个是 A到B ,另一个是 B到A 。