sqlite:获取每个项目的前X%的平均值

时间:2013-04-08 11:42:04

标签: sql sqlite aggregate-functions

是否可以获得组中前X%项的平均值?

例如:
我有一个表有item_id,timestamp和price列。输出应按item_id和timestamp分组,“price-column”应取平均值。对于平均值,只应使用该组中最低的X%价格。

我发现了类似的问题(How to select top x records for every group),但这对sqlite不起作用。

2 个答案:

答案 0 :(得分:3)

获取每组中的前n条记录需要计数。假设没有重复项,以下查询将返回项目的记录数:

select t.*,
       (select count(*) from t t2 where t2.item_id = t.item_id
       ) as NumPrices
from t

这称为相关子查询。现在,让我们扩展想法以包括排名,然后计算正确组的平均值:

select item_id, avg(price)
from (select t.*,
             (select count(*) from t t2 where t2.item_id = t.item_id
             ) as NumPrices,
             (select count(*) from t t2 where t2.item_id = t.item_id and t2.price <= t.price
             ) as PriceRank
      from t
     ) t
where (100.0*PriceRank / NumPrices) <= X
group by item_id

要提高效果,您需要(item_id, price)上的索引。

答案 1 :(得分:1)

要获取ID为I和时间戳T的组中的记录数,请使用以下查询:

SELECT COUNT(*)
FROM MyTable
WHERE item_id = I
  AND timestamp = T

要获得限制,请与X相乘,并使用ROUND / CAST转换为整数:

SELECT CAST(ROUND(COUNT(*) * X / 100) AS INTEGER)
FROM MyTable
WHERE item_id = I
  AND timestamp = T

要获取特定组中属于该限制的所有记录,请按价格订购组中的记录,并限制返回的计数:

SELECT *
FROM MyTable
WHERE item_id = I
  AND timestamp = T
ORDER BY price
LIMIT (SELECT CAST(ROUND(COUNT(*) * X / 100) AS INTEGER)
       FROM MyTable
       WHERE item_id = I
         AND timestamp = T)

理论上,要获得群组平均值,请在{}附近添加GROUP BY

SELECT item_id,
       timestamp,
       (SELECT AVG(price)
        FROM (SELECT price
              FROM MyTable T2
              WHERE T2.item_id = T1.item_id
                AND T2.timestamp = T1.timestamp
              ORDER BY price
              LIMIT (SELECT CAST(ROUND(COUNT(*) * X / 100) AS INTEGER)
                     FROM MyTable T3
                     WHERE T3.item_id = T1.item_id
                       AND T3.timestamp = T1.timestamp)
             )
       ) AS AvgPriceLowestX
FROM MyTable T1
GROUP BY item_id,
         timestamp

但是,似乎SQLite不允许从LIMIT子句中访问相关变量,因此这在实践中不起作用。 您必须获取所有组的ID(SELECT DISTINCT item_id, timestamp FROM MyTable)并对每个组执行上述第三个查询。

在任何情况下,请确保您在三列item_idtimestampprice上都有一个索引,以获得良好的效果。