Question

如何使用SQLAlchemy从表中选择一个（或一些）随机行？

Answer 1

这是一个特定于数据库的问题。

我知道PostgreSQL，SQLite，MySQL和Oracle都可以通过随机函数进行排序，因此您可以在SQLAlchemy中使用它：

from  sqlalchemy.sql.expression import func, select

select.order_by(func.random()) # for PostgreSQL, SQLite

select.order_by(func.rand()) # for MySQL

select.order_by('dbms_random.value') # For Oracle

接下来，您需要根据所需的记录数限制查询（例如使用.limit()）。

请记住，至少在PostgreSQL中，选择随机记录会产生严重的性能问题; here是关于它的好文章。

Answer 2

如果你正在使用orm并且表不是很大（或者你有缓存的行数）并且你希望它与数据库无关，那么真正简单的方法就是。

import random
rand = random.randrange(0, session.query(Table).count()) 
row = session.query(Table)[rand]

这是作弊，但这就是你使用orm的原因。

Answer 3

有一种简单的方法来提取IS数据库独立的随机行。只需使用.offset（）。无需拉动所有行：

import random
query = DBSession.query(Table)
rowCount = int(query.count())
randomRow = query.offset(int(rowCount*random.random())).first()

Table是你的表（或者你可以在那里放任何查询）。如果你想要几行，那么你可以多次运行它，并确保每一行与前一行不同。

Answer 4

以下是四种不同的变体，从最慢到最快排序。 timeit结果位于底部：

from sqlalchemy.sql import func
from sqlalchemy.orm import load_only

def simple_random():
    return random.choice(model_name.query.all())

def load_only_random():
    return random.choice(model_name.query.options(load_only('id')).all())

def order_by_random():
    return model_name.query.order_by(func.random()).first()

def optimized_random():
    return model_name.query.options(load_only('id')).offset(
            func.floor(
                func.random() *
                db.session.query(func.count(model_name.id))
            )
        ).limit(1).all()

timeit在我的Macbook上对300行的PostgreSQL表进行了10,000次运行：

simple_random(): 
    90.09954111799925
load_only_random():
    65.94714171699889
order_by_random():
    23.17819356000109
optimized_random():
    19.87806927999918

您可以很容易地看到使用func.random()比将所有结果返回到Python random.choice()要快得多。

此外，随着表格大小的增加，order_by_random()的效果会显着下降，因为ORDER BY要求全表扫描而COUNT中的optimized_random()可以使用索引。

Answer 5

某些SQL DBMS，即Microsoft SQL Server，DB2和PostgreSQL已实现SQL：2003 TABLESAMPLE子句。支持已添加到SQLAlchemy in version 1.1。它允许使用不同的采样方法返回表的样本-标准要求SYSTEM和BERNOULLI，它们返回所需的表近似百分比。

在SQLAlchemy中，FromClause.tablesample()和tablesample()用于生成TableSample构造：

# Approx. 1%, using SYSTEM method
sample1 = mytable.tablesample(1)

# Approx. 1%, using BERNOULLI method
sample2 = mytable.tablesample(func.bernoulli(1))

与映射的类一起使用时，有一点麻烦：产生的TableSample对象必须被别名才能用于查询模型对象：

sample = aliased(MyModel, tablesample(MyModel, 1))
res = session.query(sample).all()

由于许多答案都包含性能基准，因此我还将在此处包括一些简单的测试。在PostgreSQL中使用一个约有100万行和一个整数列的简单表，选择（大约）1％的样本：

In [24]: %%timeit
    ...: foo.select().\
    ...:     order_by(func.random()).\
    ...:     limit(select([func.round(func.count() * 0.01)]).
    ...:           select_from(foo).
    ...:           as_scalar()).\
    ...:     execute().\
    ...:     fetchall()
    ...: 
307 ms ± 5.72 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [25]: %timeit foo.tablesample(1).select().execute().fetchall()
6.36 ms ± 188 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [26]: %timeit foo.tablesample(func.bernoulli(1)).select().execute().fetchall()
19.8 ms ± 381 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

在急于使用SYSTEM采样方法之前，应该知道它对页面而不是单个元组进行采样，因此，例如，它可能不适用于小型表。

Answer 6

这是我使用的解决方案：

from random import randint

rows_query = session.query(Table)                # get all rows
if rows_query.count() > 0:                       # make sure there's at least 1 row
    rand_index = randint(0,rows_query.count()-1) # get random index to rows 
    rand_row   = rows_query.all()[rand_index]    # use random index to get random row

Answer 7

这是我选择表的随机行的功能：

from sqlalchemy.sql.expression import func

def random_find_rows(sample_num):
    if not sample_num:
        return []

    session = DBSession()
    return session.query(Table).order_by(func.random()).limit(sample_num).all()

Answer 8

使用这种最简单的方法这个例子是从数据库中选择一个随机问题：-

#first import the random module
import random

#then choose what ever Model you want inside random.choise() method
get_questions = random.choice(Question.query.all())

Answer 9

此解决方案将选择单个随机行

此解决方案要求主键名为id，如果不是，则应该是：

import random
max_model_id = YourModel.query.order_by(YourModel.id.desc())[0].id
random_id = random.randrange(0,max_model_id)
random_row = YourModel.query.get(random_id)
print random_row

Answer 10

有几种方法可以通过SQL，具体取决于使用的数据库。

（我认为SQLAlchemy可以使用所有这些）

MySQL的：

SELECT colum FROM table
ORDER BY RAND()
LIMIT 1

的PostgreSQL：

SELECT column FROM table
ORDER BY RANDOM()
LIMIT 1

MSSQL：

SELECT TOP 1 column FROM table
ORDER BY NEWID()

IBM DB2：

SELECT column, RAND() as IDX
FROM table
ORDER BY IDX FETCH FIRST 1 ROWS ONLY

甲骨文：

SELECT column FROM
(SELECT column FROM table
ORDER BY dbms_random.value)
WHERE rownum = 1

但我不知道任何标准方式

通过SQLAlchemy获取随机行

10 个答案:

此解决方案将选择单个随机行