我有一个巨大的440M行和30列表。 我需要在本地保存它。
我已经使用SQLAlchemy包来执行此操作,但这需要花费很多时间。
请告知我怎样才能更快地完成?我应该使用Dask吗? 这是我的代码:
import csv
import pandas as pd
from pyhive import hive
from sqlalchemy import *
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *
from odo import odo
################################################################################
# Using SQLAlchemy:
# Creating engine:
engine = create_engine('hive://er123.company.test:10000/db')
# Connecting to the engine:
con = engine.connect()
# Read database and builds SQLAlchemy Table Objects:
metadata = MetaData()
tbl_name = "tbl"
Table(tbl_name, metadata, autoload=True, autoload_with=engine)
# Create statement:
stmt = "SELECT * FROM tbl"
# Fetch the data:
data = pd.read_sql(stmt, con = con)
# To save to a csv:
# Using SQL statement:
pd.read_sql(stmt, con = con).to_csv('data.csv', index=False)