SQLAlchemy ORM转换为pandas DataFrame

时间:2015-04-08 21:36:34

标签: python pandas sqlalchemy flask-sqlalchemy

此主题尚未在此处或其他地方解决过。有没有将SQLAlchemy <Query object>转换为pandas DataFrame的解决方案?

Pandas有能力使用pandas.read_sql,但这需要使用原始SQL。我有两个理由想要避免它:1)我已经拥有使用ORM的一切(这本身就是一个很好的理由)和2)我使用python列表作为查询的一部分(例如:.db.session.query(Item).filter(Item.symbol.in_(add_symbols)其中Item是我的模型类,add_symbols是列表)。这相当于SQL SELECT ... from ... WHERE ... IN

有可能吗?

7 个答案:

答案 0 :(得分:135)

以下情况应该适用于大多数情况:

df = pd.read_sql(query.statement, query.session.bind)

有关参数的详细信息,请参阅pandas.read_sql文档。

答案 1 :(得分:51)

为了让新手大熊猫程序员更清楚,这是一个具体的例子,

pd.read_sql(session.query(Complaint).filter(Complaint.id == 2).statement,session.bind) 

在这里,我们从投诉表中选择投诉(sqlalchemy模型是投诉),id = 2

答案 2 :(得分:5)

选择的解决方案对我没有用,因为我不断收到错误

  

AttributeError:&#39; AnnotatedSelect&#39;对象没有属性&#39; lower&#39;

我发现以下情况有效:

df = pd.read_sql_query(query.statement, engine)

答案 3 :(得分:4)

出于完整性考虑:作为Pandas函数read_sql_query()的替代方法,您还可以使用Pandas-DataFrame函数from_records()来转换structured or record ndarray to DataFrame
如果您例如已经在SQLAlchemy中执行了查询,并且结果已经可用:

import pandas as pd 
from sqlalchemy import Column, Integer, String, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import scoped_session, sessionmaker


SQLALCHEMY_DATABASE_URI = 'postgresql://postgres:postgres@localhost:5432/my_database'
engine = create_engine(SQLALCHEMY_DATABASE_URI, pool_pre_ping=True, echo=False)
db = scoped_session(sessionmaker(autocommit=False, autoflush=False, bind=engine))
Base = declarative_base(bind=engine)


class Currency(Base):
    """The `Currency`-table"""
    __tablename__ = "currency"
    __table_args__ = {"schema": "data"}

    id = Column(Integer, primary_key=True, nullable=False)
    name = Column(String(64), nullable=False)


# Defining the SQLAlchemy-query
currency_query = db.query(Currency).with_entities(Currency.id, Currency.name)

# Getting all the entries via SQLAlchemy
currencies = currency_query.all()

# We provide also the (alternate) column names and set the index here,
# renaming the column `id` to `currency__id`
df_from_records = pd.DataFrame.from_records(currencies
    , index='currency__id'
    , columns=['currency__id', 'name'])
print(df_from_records.head(5))

# Or getting the entries via Pandas instead of SQLAlchemy using the
# aforementioned function `read_sql_query()`. We can set the index-columns here as well
df_from_query = pd.read_sql_query(currency_query.statement, db.bind, index_col='id')
# Renaming the index-column(s) from `id` to `currency__id` needs another statement
df_from_query.index.rename(name='currency__id', inplace=True)
print(df_from_query.head(5))

答案 4 :(得分:2)

如果要使用参数和方言特定参数编译查询,请使用以下内容:

c = query.statement.compile(query.session.bind)
df = pandas.read_sql(c.string, query.session.bind, params=c.params)

答案 5 :(得分:1)

from sqlalchemy import Column, Integer, String, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker

engine = create_engine('postgresql://postgres:postgres@localhost:5432/DB', echo=False)
Base = declarative_base(bind=engine)
Session = sessionmaker(bind=engine)
session = Session()

conn = session.bind

class DailyTrendsTable(Base):

    __tablename__ = 'trends'
    __table_args__ = ({"schema": 'mf_analysis'})

    company_code = Column(DOUBLE_PRECISION, primary_key=True)
    rt_bullish_trending = Column(Integer)
    rt_bearish_trending = Column(Integer)
    rt_bullish_non_trending = Column(Integer)
    rt_bearish_non_trending = Column(Integer)
    gen_date = Column(Date, primary_key=True)

df_query = select([DailyTrendsTable])

df_data = pd.read_sql(rt_daily_query, con = conn)

答案 6 :(得分:1)

我有一个类似的案例,但显然这篇文章中的解决方案对我不起作用。如有任何提示,我将不胜感激?

我有一个用于 SQLAlchemy 连接的自定义类:

from flask_sqlalchemy import SQLAlchemy

db = SQLAlchemy()

class RevenueForecast(db.Model):
    __tablename__ = 'revenue_forecast'
    id = db.Column(db.Integer(), primary_key=True)
    manager = db.Column(db.String(100), nullable=False)
    client_name = db.Column(db.String(200), nullable=False)
    job_description = db.Column(db.String(200), nullable=False)
    jul = db.Column(db.Integer)
    aug = db.Column(db.Integer)
    sep = db.Column(db.Integer)
    oct = db.Column(db.Integer)
    nov = db.Column(db.Integer)
    dec = db.Column(db.Integer)
    jan = db.Column(db.Integer)
    feb = db.Column(db.Integer)
    mar = db.Column(db.Integer)
    apr = db.Column(db.Integer)
    may = db.Column(db.Integer)
    jun = db.Column(db.Integer)
    is_active = db.Column(db.Boolean)

我以非常简单的方式读取数据,而不是创建引擎等:

forecasts = RevenueForecast.query.all()

我想将其转换为 DataFrame。想按照此线程中的建议使用 from_records:

forecast_data = pd.DataFrame.from_records(
    forecasts,
    index = 'id'
)
print(df_from_records.head(5))

然而,这会产生错误:

TypeError: 'RevenueForecast' object is not iterable

以下是有效的(列只是类中的列列表):

forecast_data = pd.DataFrame()
for i in columns:
    br = 'forecast_data[i] = [k.%s for k in forecasts]' % i
    exec(br)

仍然,我想使用 Pandas 将类列表从 SQLAlchemy 转换为 DataFrame(更防错)。我在哪里犯了错误?我应该让 RevenueForecast 类可迭代吗?