Question

我想从我的sqlalchemy查询中生成一个带有pandas read_sql的数据框，并将PostgreSQL的jsonb属性添加到列中。

实际上这会给我答案：

query = session.query(
    cls.id,
    cls._my_jsonb_column
).all()
pd.DataFrame.from_dict([dict(id=id_, **i) for id_,i in query])

但我更喜欢用PostgreSQL而不是在应用程序中解压缩jsonb。

我的尝试给出了

query = session.query(
    cls.id,
    func.jsonb_to_record(cls._my_jsonb_column)
)
pd.read_sql(query.statement, query.session.bind)

（psycopg2.NotSupportedError）函数返回在上下文中调用的记录，不能接受类型记录

Answer 1

json_to_record（和jsonb_to_recordset）返回记录，就好像它是SELECT查询的结果。在sqlalchemy上下文中，它提供了一个可以像表一样使用的选择。

因此，您应该将func.jsonb_to_record(cls._my_jsonb_column)的结果视为一种可以联接到原始表的表。

那您的查询应如下所示：

jsonb_data = func.jsonb_to_record(cls._my_jsonb_column)
query = session.query(
    select(
        [cls.id, <other columns>]
    ).select_from(
        cls.join(jsonb_data, <on_clause>)
    )
)

您甚至可以使用JSON processing functions展平JSON数据，但是如果不了解JSON数据的结构，就不可能更精确。

或者，我最近发布了一个软件包，可以轻松地从json数据的描述中拉平JSONB字段，我很乐意得到一些反馈：pg_jsonb_flattener

来自jsonb的SQLAlchemy Pandas read_sql

1 个答案: