如何使用Snowflake sql查询的结果填充pandas DataFrame?

时间:2018-11-02 07:58:13

标签: pandas dataframe snowflake

使用Python Connector我可以查询Snowflake:

import snowflake.connector

# Gets the version
ctx = snowflake.connector.connect(
    user=USER,
    password=PASSWORD,
    account=ACCOUNT,
    authenticator='https://XXXX.okta.com',
    )
ctx.cursor().execute('USE warehouse MY_WH')
ctx.cursor().execute('USE MYDB.MYSCHEMA')


query = '''
select * from MYDB.MYSCHEMA.MYTABLE
LIMIT 10;
'''

cur = ctx.cursor().execute(query)

结果为snowflake.connector.cursor.SnowflakeCursor。如何将其转换为Pandas DataFrame?

2 个答案:

答案 0 :(得分:3)

您可以将DataFrame.from_records()pandas.read_sql()snowflake-sqlalchemy一起使用。雪花炼金术选项具有更简单的API

pd.DataFrame.from_records(iter(cur), columns=[x[0] for x in cur.description])

将返回一个带有从SQL结果中获取的正确列名的DataFrame。 iter(cur)会将光标转换为迭代器,cur.description给出列的名称和类型。

因此完整的代码将是

import snowflake.connector
import pandas as pd

# Gets the version
ctx = snowflake.connector.connect(
    user=USER,
    password=PASSWORD,
    account=ACCOUNT,
    authenticator='https://XXXX.okta.com',
    )
ctx.cursor().execute('USE warehouse MY_WH')
ctx.cursor().execute('USE MYDB.MYSCHEMA')


query = '''
select * from MYDB.MYSCHEMA.MYTABLE
LIMIT 10;
'''

cur = ctx.cursor().execute(query)
df = pd.DataFrame.from_records(iter(cur), columns=[x[0] for x in cur.description])

如果您更喜欢使用pandas.read_sql,则可以

import pandas as pd

from sqlalchemy import create_engine
from snowflake.sqlalchemy import URL


url = URL(
    account = 'xxxx',
    user = 'xxxx',
    password = 'xxxx',
    database = 'xxx',
    schema = 'xxxx',
    warehouse = 'xxx',
    role='xxxxx',
    authenticator='https://xxxxx.okta.com',
)
engine = create_engine(url)


connection = engine.connect()

query = '''
select * from MYDB.MYSCHEMA.MYTABLE
LIMIT 10;
'''

df = pd.read_sql(query, connection)

答案 1 :(得分:3)

现在有一种方法.fetch_pandas.all(),不再需要SQL Alchemy。

请注意,您需要通过执行以下操作为大熊猫安装雪花.connector

pip install snowflake-connector-python[pandas]

完整文档here

import pandas as pd
import snowflake.connector

conn = snowflake.connector.connect(
            user="xxx",
            password="xxx",
            account="xxx",
            warehouse="xxx",
            database="MYDB",
            schema="MYSCHEMA"
            )

cur = conn.cursor()

# Execute a statement that will generate a result set.
sql = "select * from MYTABLE limit 10"
cur.execute(sql)
# Fetch the result set from the cursor and deliver it as the Pandas DataFrame.
df = cur.fetch_pandas_all()