使用python将已还原的MS-SQL DB与原始数据库进行比较

时间:2016-04-26 07:00:03

标签: python sql-server sql-server-2008

我正在使用外部实用程序来备份我的MS-SQL dbs,它会备份我的dbs并在恢复它时会给我db文件(.mdf,.ldf)文件。

现在,我需要的是,验证Restored db文件(.mdf& .ldf文件)是否与原始db文件相同或不使用$(document).ready(function(e) { $('#loginButton').click(function(e) { $.post('../php/login.php', { functions: 'userLogin', username: $('#username').val(), userpass: $('#userpass').val() }).done(function( data ) { alert("Data Loaded: " + data); }); }); }); 进行自动化。是否有任何实用程序可以与python集成db比较?

我正在调查pymssql模块,但不确定,我是否可以使用此模块比较恢复的数据库...

2 个答案:

答案 0 :(得分:0)

我查看了pymssqlpymssql的作用是,只执行传递给它的SQL查询并返回相同的输出。但是,安装pymssql(在Windows上)确实是一项非常困难的任务。因此,我更喜欢使用SQL Server的内置pymssql实用程序来执行我的SQL查询,而不是使用SQLCMD

因此,我使用os.system()的{​​{1}}来通过python执行我的SQL查询。因此,我通过python将我的SQL查询写入.sql文件,然后将其传递给SQLCMD并将输出重定向到另一个文本文件。

首先,我使用SQLCMD收集了给定数据库中的表列表。然后,我逐个查询所有表,并将它们的数据放入一个单独的文本文件SQLCMD中。对于还原的db,我也遵循相同的过程并将数据转储到另一个文本文件(Original_DB_Data.txt)。代码段如下: -

(Restored_DB_Data.txt)

如果您希望改进 import os def get_table_list_in_DB(Instance_name, DB_name): query = 'SELECT NAME from [%s].sys.tables' %(DB_name) input_file = "C:\\SQL_Data\\SQLQuery_table_list.sql" output_file = "C:\\SQL_Data\\Table_list_DB-%s_Output.txt" %(DB_name) with open(input_file, 'w') as f: f.write(query) command = 'sqlcmd -S %s -i %s -o %s' %(Instance_name, input_file, output_file) os.system(command) os.remove(input_file) # to delete the created .sql file return output_file def get_db_data(Instance_name, DB_name): output_file = "C:\\SQL_Data\\DB_Data-%s_Output.txt" %(DB_name) table_list = get_table_list_in_DB(Instance_name, DB_name) flag = 0 with open(table_list, 'r') as f1, open(output_file, 'a') as f2: for lines in f1: if re.match("^\s", lines): flag = 0 if flag: table_data = get_table_data(Instance_name, DB_name, table_name=lines.strip()) with open(table_data, 'r') as f3: f2.write("##################################" + '\n') f2.write('\t' +lines.strip() + '\n') f2.write("##################################" + '\n') f2.write(f3.read()) os.remove(table_data) if re.match("^----+", lines): flag = 1 return output_file def get_table_data(Instance_name, DB_name, table_name): input_file = "C:\\SQL_Data\\SQLQuery_table_data.sql" output_file = "C:\\SQL_Data\\Table_data_%s_Output.txt" %(table_name) query = "SELECT * from [%s].dbo.[%s]" %(DB_name, table_name) with open(input_file, 'w') as f: f.write(query) command = "sqlcmd -S %s -i %s -o %s" %(Instance_name, input_file, output_file) os.system(command) os.remove(input_file) return output_file def compare_DB_Data(DB_Detail1=[], DB_Detail2=[]): get_db_data(Instance_name=DB_Detail1[0], DB_name=DB_Detail1[1]) get_db_data(Instance_name=DB_Detail2[0], DB_name=DB_Detail2[1]) data_DB1 = "C:\\SQL_Data\\DB_Data-%s_Output.txt" %(DB_Detail1[1]) data_DB2 = "C:\\SQL_Data\\DB_Data-%s_Output.txt" %(DB_Detail2[1]) with open(data_DB1, 'r') as f1, open(data_DB2, 'r') as f2: if f1.read() == f2.read(): print "Data of both DB Matches" else: print "Data of both DB Varies" 方法以获得确切的差异,那么您可以按如下方式重写它,这会将差异转储到另一个文本文件中,以后可以请参阅,以便检查究竟有什么区别: -

compare_DB_Data()

答案 1 :(得分:0)

我故意添加这个答案。只有在SQL服务器所在的同一台机器上运行时,上述答案才有效。而且,早些时候我们使用的是文本文件,它不能确保两个dbs之间的完全比较。因此,这个新的方法是基于字典的方法来避免上述问题。

即使您的SQL服务器和代码位于不同的计算机上,以下代码仍然有效。但是,为此,您需要在存在SQL Server的计算机上启用WINRM模块。 WINRM是一个窗口的内置模块,用于在Windows机器之间进行通信。要在计算机上启用WINRM,您需要在SQL计算机的命令行中运行以下命令。

winrm qc -q
winrm set winrm/config/client/auth @{Basic="true"}
winrm set winrm/config/service/auth @{Basic="true"}
winrm set winrm/config/service @{AllowUnencrypted="true"}

这将在您的计算机上启用WINRM模块。现在我们需要pywinrm python模块来与我们的远程SQL主机进行通信。有关安装pywinrm的信息,请参阅this链接。

所以,首先我们将收集数据库中存在的表列表(使用方法get_table_list_in_DB。将这些表存储到文本文件中。此列表将用于收集表模式和表数据库中存在的所有表的数据。逐个从文件中读取每个表名,并查询表模式和表数据。

get_db_Schema()方法读取上面返回的表列表并在内部调用get_table_Schema()方法,收集数据库中所有表的架构详细信息。为每个表返回的模式存储为字典键值对,其中表名是键,返回的模式是值。因此,整个数据库模式是一个字典,每个表为key,其模式为value

以同样的方式,get_db_data()get_table_data()有效。 get_db_data()收集每个表的数据并存储在字典中,表名为key,返回的数据为value

因此,为了比较两个数据库,我们将数据库详细信息(数据库名称和实例名称)传递给compare_DB_Data()方法,后者又调用两个数据库的get_db_Schema()并进行比较。如果两个数据库的模式匹配,则in将为两个数据库调用get_db_data()方法并匹配它们。因为,我们以键值对的形式存储了表模式和表数据,所以如果DB1中的每个键(对于模式和数据)在两个DB中都具有相同的值,那么我们可以确保两者都有我们的DB是一样的。

如果发现两个DB的架构有任何差异,则会将这些差异添加到schema1_diff_schema2 schema2_diff_schema1字典中。同样,数据差异也会添加到data1_diff_data2data2_diff_data1字典中。

以下是相同的代码: -

class SQL_Compare_DB(object):

    def __init__(self, SQL_Host_IP, auth):
        # Append to path, in case not present
        sys.path.append(r"C:\Program Files\Microsoft SQL Server\110\Tools\Binn")
        # Create session with SQL host
        self.session = winrm.Session(SQL_Host_IP, auth)
        # We need a directory where the db files will be stored
        if not os.path.exists("C:\SQL_Data"):
            os.mkdir("C:\SQL_Data")
        else:
            os.system("RMDIR /S /Q C:\SQL_Data")
            os.system("MKDIR C:\SQL_Data")


    def get_table_list_in_DB(self, Instance_name, DB_name):
        '''
        Returns the list of table in the given database
        '''
        output_file = "C:\\SQL_Data\\Table_list_DB-%s_Output.txt" %(DB_name)

        query = 'SELECT NAME from [%s].sys.tables' %(DB_name)
        command = 'sqlcmd -S "%s" -Q "%s"' %(Instance_name, query)

        execute_query = self.session.run_cmd(command)
        if execute_query.std_err:
            print "Error in command execution :- ", execute_query.std_err
            return False
        with open(output_file, 'w') as f:
            f.write(execute_query.std_out)
        return output_file

  def get_db_Schema(self, Instance_name, DB_name):
    '''
    Get the schema of all tables in the Database
    '''
    table_list = self.get_table_list_in_DB(Instance_name, DB_name)
    flag = 0
    db_schema = {}

    with open(table_list, 'r') as f1:
        for lines in f1:
            if re.match("^\s", lines):
                flag = 0
            if flag:
                table_schema = self.get_table_Schema(
                    Instance_name, DB_name, table_name=lines.strip())
                db_schema.update(table_schema)
            if re.match("^----+?-", lines):
                flag = 1
    print db_schema
    return db_schema


def get_table_Schema(self, Instance_name, DB_name, table_name):
    '''
    Get the table schema
    '''
    table_schema = {}
    query = """
    SELECT ORDINAL_POSITION, COLUMN_NAME, DATA_TYPE,
    CHARACTER_MAXIMUM_LENGTH, IS_NULLABLE
    FROM [%s].INFORMATION_SCHEMA.COLUMNS
    WHERE TABLE_NAME = '%s'""".replace('\n','') %(DB_name, table_name)

    command = 'sqlcmd -S "%s" -Q "%s"' %(Instance_name, query)

    execute_query = self.session.run_cmd(command)
    table_schema[table_name] = execute_query.std_out
    if execute_query.std_err:
        print "Error in command execution :- ", execute_query.std_err
        return False
    return table_schema

def get_db_data(self, Instance_name, DB_name):
    '''
    From the given DB, it fetchs the data of all existing
    tables and append those in a single text file
    '''
    db_data = {}
    table_list = self.get_table_list_in_DB(Instance_name, DB_name)

    flag = 0
    with open(table_list, 'r') as f1:
        for lines in f1:
            if re.match("^\s", lines):
                flag = 0
            if flag:
                table_data = self.get_table_data(
                    Instance_name, DB_name, table_name=lines.strip())
                db_data.update(table_data)
            if re.match("^----+?-", lines):    # after the line ------ the tabel data are printed
                flag = 1

    print db_data
    return db_data

def get_table_data(self, Instance_name, DB_name, table_name):
    '''
    Get the data for the given table
    '''
    table_data = {}

    query = "SELECT * from [%s].dbo.[%s]" %(DB_name, table_name)
    command = 'sqlcmd -S "%s" -Q "%s"' %(Instance_name, query)
    execute_query = self.session.run_cmd(command)
    table_data[table_name] = execute_query.std_out
    if execute_query.std_err:
        print "Error in command execution :- ", execute_query.std_err
        return False
    return table_data

def compare_DB_Data(self, DB_Detail1=[], DB_Detail2=[]):
    '''
     Take detail of two DBs as input and gets the detailed DB data
     The data collected are dumped into one dictionary and at last
     both the dictionary are compared.

     Arguments :-
     DB_Detail1 :- a list of instance name and DB name for original DB
     DB_Detail2 :- a list of instance name and DB name for Restored DB
     e.g :- DB_Detail1 = ['Instance_name', 'Database_name']
    '''

    # Compare schema
    db_schema1 = self.get_db_Schema(
        Instance_name=DB_Detail1[0], DB_name=DB_Detail1[1])
    db_schema2 = self.get_db_Schema(
        Instance_name=DB_Detail2[0], DB_name=DB_Detail2[1])
    schema1_diff_schema2 = {}
    schema2_diff_schema1 = {}


    print db_schema1
    print db_schema2
    set_current, set_past = set(db_schema1.keys()), set(db_schema2.keys())
    intersect = set_current.intersection(set_past)
    added = set_current - intersect
    removed = set_past - intersect
    changed = set(k for k in intersect if db_schema2[k] != db_schema1[k])
    unchanged = set(k for k in intersect if db_schema2[k] == db_schema1[k])
    print added,removed,changed,unchanged

    [schema1_diff_schema2.update(i) for i in [{m :db_schema1[m]} for m in added ]]
    [schema1_diff_schema2.update(i) for i in [{m :db_schema1[m]} for m in changed]]
    [schema2_diff_schema1.update(i) for i in [{m :db_schema2[m]} for m in removed]]
    [schema2_diff_schema1.update(i) for i in [{m :db_schema2[m]} for m in changed]]

    if added ==  set([]) and removed == set([]) and changed == set([]):
        print "Schema of both DB Matches"
    else:
        print "Schema of both DB Varies"


    # Compare data
    data_DB1 = self.get_db_data(
        Instance_name=DB_Detail1[0], DB_name=DB_Detail1[1])
    data_DB2 = self.get_db_data(
        Instance_name=DB_Detail2[0], DB_name=DB_Detail2[1])
    data1_diff_data2 = {}
    data2_diff_data1 = {}

    set_current_data, set_past_data = set(data_DB1.keys()), set(data_DB2.keys())
    intersect = set_current_data.intersection(set_past_data)
    added = set_current_data - intersect
    removed = set_past_data - intersect
    changed = set(k for k in intersect if data_DB2[k] != data_DB1[k])
    unchanged = set(k for k in intersect if data_DB2[k] == data_DB1[k])
    print added,removed,changed,unchanged


    [data1_diff_data2.update(i) for i in [{m :data_DB1[m]} for m in added ]]
    [data1_diff_data2.update(i) for i in [{m :data_DB1[m]} for m in changed]]
    [data2_diff_data1.update(i) for i in [{m :data_DB2[m]} for m in removed]]
    [data2_diff_data1.update(i) for i in [{m :data_DB2[m]} for m in changed]]

    print "Diff DB1 vs DB2 :- ", data1_diff_data2
    print "Diff DB1 vs DB2 :- ", data2_diff_data1

    if added ==  set([]) and removed == set([]) and changed == set([]):
        print "Data of both DB Matches"
    else:
        print "Data of both DB Varies"