Question

我想出了一种从@numRows-2行（或多或少）选择@totalCount均匀间隔行的方法（{}括号中的伪代码，相同的描述意味着相同的代码）：

declare @numRows int
set @numRows = {how many rows I want}

declare @totalCount bigint
select @totalCount = count(1) from {table}
where {filter condition on indexed rows}

declare @mult float
set @mult = 0.999999999 * (@numRows - 2) / (@totalCount - 1)

select ts.rownum, --debug only
        ts.{other data},
        ...
from
(
    select  {other data},
            ...,
            ROW_NUMBER() OVER (ORDER BY {ordering}) as rownum
    from {table}
    where {filter condition on indexed rows}
) as ts
where round(@mult * ts.rownum, 0) <> round((ts.rownum + 1) * @mult, 0)
order by {ordering}

（0.99999 ...因素是为了解决我猜测的浮动舍入错误导致返回额外的行 - 这也是我寻找更好算法的原因之一。）

结果示例：

100从200：每第2行

200从500：跳过2，然后是3，然后是2，依此类推

100从100：一切都返回

50行100行：一切都返回

我不相信这是执行此类过滤的最有效方式，但我无法想出更好的方法。我发现SQL答案做了类似的但不一样的，非SQL的答案正是这样做的，但不是基于集合的。

假设表数据本身没有任何内容（例如标识列）可能有助于查询，我想知道是否通常有更有效的方法来执行此操作。我对临时结果表持开放态度。

如果这很有趣/相关：这是一款无风扇嵌入式PC，配备SSD，4C Celeron 1.8GHz，SQL Server 2014 Express，最大500MB RAM分配给SQL。通常为@numRows=2，000和@totalCount=100，000至1,000,000。这是时间序列重新采样。

从SQL表中有效地选择每个第n行，其中n不是整数

0 个答案: