Python SQL数据库使用泡菜稀疏矩阵

时间:2018-07-12 12:38:26

标签: python database python-2.7 sparse-matrix

我想将一些大数据导入python。我尝试使用普通的数据帧,但是由于必须执行一些消耗资源的操作,因此我经常遇到内存错误。

我的新方法是从SQL数据库中下载泡菜数据并将其作为稀疏矩阵加载到python。为此,我编写了两个函数,但不起作用。

请原谅我的代码,我是python的新手。我很高兴收到任何建议。谢谢!

    def DatabaseToFile(query, local_filename):
        filename = local_filename + '.obj'
        import pickle
        import pyodbc

        li_log = list()
        pickle_filehandler = open(filename, 'wb') # write binary mode
        pickle.dump(li_log, pickle_filehandler)

        connection = pyodbc.connect(
                r"Driver={SQL Server Native Client 11.0};"
                r"Server=my_server;"
                r"Database=my_database;"
                r"uid=my_user;"
                r"pwd=my_password;"
                )
        with connection as session:
            for chunk in pd.read_sql(query, session, chunksize=50000):
                pickle.dump(chunk, pickle_filehandler)
        session.close()
        pickle_filehandler.close()
        return li_log


    def FileToMatrix(local_filename):
        import pickle
        filename = local_filename + '.obj'
        pickle_filehandler = open(filename, 'rb') # read binary mode
        df = pd.DataFrame()
        while True:
            try:
                chunk = pickle.load(pickle_filehandler)
                df = df.append(chunk, ignore_index=True)
            except:
                break
        sdf = df.to_sparse()    
        pickle_filehandler.close()
        return sdf

0 个答案:

没有答案