确定两个MySQL数据库模式之间的差异

时间:2015-07-04 02:46:42

标签: mysql sql

我想在数据库BB中获取数据库AA中缺少的任何表或字段。我使用INFORMATION_SCHEMA.columns来获取信息。所以,我写了一个'缺失记录'查询来查找它们。在测试中,我使用了2个数据库,我知道BB在另一个表中有1个丢失的表和1个丢失的字段 这是我的第一次尝试:

SELECT AA.table_name,
       AA.column_name,
       BB.table_name,
       BB.column_name
FROM   information_schema.columns AS AA
       LEFT JOIN information_schema.columns AS BB
              ON ( AA.table_name = bb.table_name )
                 AND ( AA.column_name = BB.column_name )
WHERE  AA.table_schema = 'wireless-2015-05'
   AND BB.table_schema = 'wireless-2015-04'
   AND BB.column_name IS NULL

这返回了0条记录。所以,我试过了:

SELECT AA.table_name,
       AA.column_name
FROM   information_schema.columns AS AA
WHERE  AA.table_schema = 'wireless-2015-04'
   AND NOT EXISTS(SELECT BB.table_name,
                         BB.column_name
                  FROM   information_schema.columns AS BB
                  WHERE  BB.table_schema = 'wireless-2015-05')

我再次得到0条记录。最后我尝试了这个:

SELECT table_name,
       column_name
FROM   (SELECT DISTINCT table_name,
                        column_name
        FROM   information_schema.columns
        WHERE  table_schema = 'wireless-2015-04'
        UNION ALL
        SELECT DISTINCT table_name,
                        column_name
        FROM   information_schema.columns
        WHERE  table_schema = 'wireless-2015-05') AS tbl
GROUP  BY table_name,
          column_name
HAVING Count(*) = 1 

这产生了期望的结果。

虽然我不介意使用第三个查询,但我无法弄清楚为什么前两个不起作用。我想知道以备将来参考。谁能发现问题?

<小时/> 更新:
对于那些感兴趣的人,这里有4个有效的查询,以及运行每个查询的时间。按照最快的顺序列出,在查询下方列出时间。

SELECT AA.table_name,
       AA.column_name
FROM   information_schema.columns AS AA
       LEFT JOIN (SELECT table_name,
                         column_name
                  FROM   information_schema.columns
                  WHERE  table_schema = 'wireless-2015-04') BB
              ON AA.table_name = BB.table_name
                 AND AA.column_name = BB.column_name
WHERE  AA.table_schema = 'wireless-2015-05'
       AND BB.table_name IS NULL; 

.047秒

SELECT table_name,
       column_name
FROM   (SELECT DISTINCT table_name,
                        column_name
        FROM   information_schema.columns
        WHERE  table_schema = 'wireless-2015-04'
        UNION ALL
        SELECT DISTINCT table_name,
                        column_name
        FROM   information_schema.columns
        WHERE  table_schema = 'wireless-2015-05') AS tbl
GROUP  BY table_name,
          column_name
HAVING Count(*) = 1; 

.078秒

SELECT DISTINCT table_name,
                column_name,
                Concat(table_name, '--', column_name) AS tc
FROM   information_schema.columns
WHERE  table_schema = 'wireless-2015-05'
HAVING tc NOT IN(SELECT DISTINCT Concat(table_name, '--', column_name)
                 FROM   information_schema.columns
                 WHERE  table_schema = 'wireless-2015-04'); 

.125秒(我今天早上想到的新解决方案)

SELECT aa.table_name,
       aa.column_name
FROM   information_schema.columns aa
WHERE  table_schema = 'wireless-2015-05'
       AND NOT EXISTS (SELECT 1
                       FROM   information_schema.columns
                       WHERE  table_schema = 'wireless-2015-04'
                              AND table_name = aa.table_name
                              AND column_name = aa.column_name); 
44.382秒。显然不是一个好的现实世界的解决方案。

2 个答案:

答案 0 :(得分:1)

让我们说记录看起来像这样:

   schema              table    column
   ----------------    -----    ------
1. wireless-2015-05    T1       F1
2. wireless-2015-05    T1       F2
3. wireless-2015-05    T2       F1
4. wireless-2015-04    T1       F1

请注意,wireless-2015-04缺少表T2和列T1.F2。我们将在描述和SQL Fiddle示例中使用此示例。你在前两次尝试中非常接近。只需稍加修改(包含在下面)即可修改它。

查询1

让我们分解第一个查询。我们将保留where子句,因为上面的示例只包含where子句中提到的那两个模式。

SELECT ...
FROM information_schema.columns AS AA
LEFT JOIN information_schema.columns AS BB 
    on aa.table_name = bb.table_name
    and aa.column_name = bb.column_name

wireless-2015-05 + T1 + F1的第一条记录(基于表和列名称)与同一表中的所有记录匹配。所以,

  • AA的记录#1将与BB的记录#1和#4
  • 相匹配
  • AA的纪录#2将与BB的纪录#2相匹配
  • AA的纪录#3将与BB的纪录#3相匹配
  • AA的记录#4将与BB的记录#1和#4
  • 相匹配

示例:http://sqlfiddle.com/#!9/6b704/4

NULL BB.column_name没有记录。所以没有提取记录。但是,这不是你想要的。

查询1改进

您可以使用以下内容重新编写查询1以获得正确的结果:

SELECT AA.table_name,
       AA.column_name
FROM information_schema.columns AS AA
LEFT JOIN 
( 
  select table_name, column_name from
  information_schema.columns
  where table_schema = 'wireless-2015-04'
) BB
  on AA.table_name = BB.table_name
  and AA.column_name = BB.column_name
WHERE 
  AA.table_schema = 'wireless-2015-05'
  and BB.table_name is null

示例:http://sqlfiddle.com/#!9/6b704/10

查询2

基本上,查询2的NOT EXISTS子查询缺少与AA列匹配的子句。这样就不会产生结果

查询2改进

通过执行以下操作可以正确地改进该查询:

select aa.table_name, aa.column_name
from information_schema.columns aa
where table_schema = 'wireless-2015-05'
and not exists (
  select 1
  from information_schema.columns
  where table_schema = 'wireless-2015-04'
  and table_name = aa.table_name
  and column_name = aa.column_name
);

示例:http://sqlfiddle.com/#!9/6b704/9

希望这有帮助。

答案 1 :(得分:0)

您的第一个查询应该是,

Select AA.*
(
    SELECT table_name,
           column_name
    From information_schema.columns
    Where table_schema = 'wireless-2015-05'
) AA
LEFT JOIN
(
    SELECT table_name,
           column_name
    From information_schema.columns
    Where table_schema = 'wireless-2015-04'
)BB
on AA.table_name = BB.table_name
AND AA.column_name = BB.column_name

WHERE BB.table_name is null or BB.column_name is null

您的问题

你已经放置了错误条件的查询

WHERE  AA.table_schema = 'wireless-2015-05'
   AND BB.table_schema = 'wireless-2015-04'
   AND BB.column_name IS NULL

当BB中没有记录时,BB.table_schema = 'wireless-2015-04'这个条件变为假,所以整个结果都是假的,所以你没有得到结果。

对于第二个查询,我认为 @zedfoxus 是对的。

您也可以使用EXCEPT的概念,它可以为您提供所需的结果。

以下查询返回EXCEPT运算符左侧查询中的任何不同值,这些值在右侧查询中也找不到。

SELECT DISTINCT table_name,
                column_name
FROM   information_schema.columns
WHERE  table_schema = 'wireless-2015-05'

EXCEPT

SELECT DISTINCT table_name,
                column_name
FROM   information_schema.columns
WHERE  table_schema = 'wireless-2015-04'