SQLAlchemy中的PostgreSQL ts_stat

时间:2018-04-08 22:52:40

标签: python postgresql sqlalchemy full-text-search tsvector

Postgres对ts_stat查询使用奇怪的语法,其中包含一个文字字符串,其中包含您想要统计的语句,例如:

SELECT * FROM ts_stat('SELECT content_ts FROM document_contents')
ORDER BY nentry DESC, ndoc DESC, word;

我想在SQLAlchemy中使用Query对象来处理包含许多可选过滤器的复杂查询:

SELECT content_ts 
FROM document_contents
JOIN fact_api ON document_contents.id = fact_api.content_id 
WHERE fact_api.day >= %(day_1)s
AND fact_api.day <= %(day_2)s
AND fact_api.unit IN (%(unit_1)s)
AND fact_api.term IN (%(term_1)s, %(term_2)s)

我有SQLAlchemy代码,它生成内部查询。有没有一种很好的方法来生成ts_stat查询?

2 个答案:

答案 0 :(得分:1)

您可以隐藏custom FunctionElement中的实际编译:

from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import FunctionElement, column
from sqlalchemy.sql.base import ColumnCollection
from sqlalchemy.types import TEXT, INTEGER


class ts_stat(FunctionElement):
    name = "ts_stat"

    @property
    def columns(self):
        # Using (undocumented) `_selectable=self` would allow
        # omitting the explicit `select_from(ts_stat_obj)` in
        # every query using `ts_stat`.
        return ColumnCollection(
            column("word", TEXT),
            column("ndoc", INTEGER),
            column("nentry", INTEGER))

@compiles(ts_stat, 'postgresql')
def pg_ts_stat(element, compiler, **kw):
    kw.pop("asfrom", None)  # Ignore and set explicitly
    arg1, = element.clauses
    # arg1 is a FromGrouping, which would force parens around the SELECT.
    stmt = compiler.process(
        arg1.element, asfrom=False, literal_binds=True, **kw)
    # TODO: Choose a random tag for dollar quoting. Another option
    # would be to wrap the stmt in `literal()`, compiling that, and
    # letting the driver worry about quoting.
    return f"ts_stat($${stmt}$$)"

用法很简单:您传递SelectQuery作为唯一参数:

from sqlalchemy import select, column, literal
from sqlalchemy.dialects import postgresql
from sqlalchemy.orm import sessionmaker

d = postgresql.dialect()

s = select([1])
f = ts_stat(s)
stmt = select([f.c.word, f.c.ndoc, f.c.nentry]).\
    select_from(f).\
    order_by(f.c.nentry.desc(),
             f.c.ndoc.desc(),
             f.c.word).\
    compile(dialect=d)
print(stmt)
# SELECT word, ndoc, nentry 
# FROM ts_stat($$SELECT 1$$) ORDER BY nentry DESC, ndoc DESC, word

Session = sessionmaker()
session = Session()

q = session.query(literal(1))
f2 = ts_stat(q)
stmt2 = select(['*']).\
    select_from(f2).\
    order_by(f2.c.nentry.desc(),
             f2.c.ndoc.desc(),
             f2.c.word).\
    compile(dialect=d)
print(stmt2)
# SELECT * 
# FROM ts_stat($$SELECT 1 AS param_1$$) ORDER BY nentry DESC, ndoc DESC, word

请注意,使用literal_binds=True会将您作为参数传递的内容限制为内部选择,如"How do I render SQL expressions as strings, possibly with bound parameters inlined?"中所述。

当然,对于其他读者而言,这样的构造使得DB函数ts_stat()在现实中接受字符串参数是不明显的,但在这种情况下,它的方便性可能会取得胜利。

答案 1 :(得分:0)

这似乎有效:

query = session.query( ... lots of joins ... )
literal_query = str(query.statement.compile(engine, compile_kwargs={"literal_binds": True}))
ts_stat = text('SELECT * FROM ts_stat($$' + 
               literal_query + 
               '$$) ORDER BY nentry DESC, ndoc DESC, word')
for row in session.execute(ts_stat):
    print(row)

查看此内容以获取查询: http://docs.sqlalchemy.org/en/latest/faq/sqlexpressions.html

这对于$$https://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQL-SYNTAX-DOLLAR-QUOTING