我想将一些大数据导入python。我尝试使用普通的数据帧,但是由于必须执行一些消耗资源的操作,因此我经常遇到内存错误。
我的新方法是从SQL数据库中下载泡菜数据并将其作为稀疏矩阵加载到python。为此,我编写了两个函数,但不起作用。
请原谅我的代码,我是python的新手。我很高兴收到任何建议。谢谢!
def DatabaseToFile(query, local_filename):
filename = local_filename + '.obj'
import pickle
import pyodbc
li_log = list()
pickle_filehandler = open(filename, 'wb') # write binary mode
pickle.dump(li_log, pickle_filehandler)
connection = pyodbc.connect(
r"Driver={SQL Server Native Client 11.0};"
r"Server=my_server;"
r"Database=my_database;"
r"uid=my_user;"
r"pwd=my_password;"
)
with connection as session:
for chunk in pd.read_sql(query, session, chunksize=50000):
pickle.dump(chunk, pickle_filehandler)
session.close()
pickle_filehandler.close()
return li_log
def FileToMatrix(local_filename):
import pickle
filename = local_filename + '.obj'
pickle_filehandler = open(filename, 'rb') # read binary mode
df = pd.DataFrame()
while True:
try:
chunk = pickle.load(pickle_filehandler)
df = df.append(chunk, ignore_index=True)
except:
break
sdf = df.to_sparse()
pickle_filehandler.close()
return sdf