是否可以获得组中前X%项的平均值?
例如:
我有一个表有item_id,timestamp和price列。输出应按item_id和timestamp分组,“price-column”应取平均值。对于平均值,只应使用该组中最低的X%价格。
我发现了类似的问题(How to select top x records for every group),但这对sqlite不起作用。
答案 0 :(得分:3)
获取每组中的前n条记录需要计数。假设没有重复项,以下查询将返回项目的记录数:
select t.*,
(select count(*) from t t2 where t2.item_id = t.item_id
) as NumPrices
from t
这称为相关子查询。现在,让我们扩展想法以包括排名,然后计算正确组的平均值:
select item_id, avg(price)
from (select t.*,
(select count(*) from t t2 where t2.item_id = t.item_id
) as NumPrices,
(select count(*) from t t2 where t2.item_id = t.item_id and t2.price <= t.price
) as PriceRank
from t
) t
where (100.0*PriceRank / NumPrices) <= X
group by item_id
要提高效果,您需要(item_id, price)
上的索引。
答案 1 :(得分:1)
要获取ID为I
和时间戳T
的组中的记录数,请使用以下查询:
SELECT COUNT(*)
FROM MyTable
WHERE item_id = I
AND timestamp = T
要获得限制,请与X
相乘,并使用ROUND
/ CAST
转换为整数:
SELECT CAST(ROUND(COUNT(*) * X / 100) AS INTEGER)
FROM MyTable
WHERE item_id = I
AND timestamp = T
要获取特定组中属于该限制的所有记录,请按价格订购组中的记录,并限制返回的计数:
SELECT *
FROM MyTable
WHERE item_id = I
AND timestamp = T
ORDER BY price
LIMIT (SELECT CAST(ROUND(COUNT(*) * X / 100) AS INTEGER)
FROM MyTable
WHERE item_id = I
AND timestamp = T)
理论上,要获得群组平均值,请在{}附近添加GROUP BY
:
SELECT item_id,
timestamp,
(SELECT AVG(price)
FROM (SELECT price
FROM MyTable T2
WHERE T2.item_id = T1.item_id
AND T2.timestamp = T1.timestamp
ORDER BY price
LIMIT (SELECT CAST(ROUND(COUNT(*) * X / 100) AS INTEGER)
FROM MyTable T3
WHERE T3.item_id = T1.item_id
AND T3.timestamp = T1.timestamp)
)
) AS AvgPriceLowestX
FROM MyTable T1
GROUP BY item_id,
timestamp
但是,似乎SQLite不允许从LIMIT
子句中访问相关变量,因此这在实践中不起作用。
您必须获取所有组的ID(SELECT DISTINCT item_id, timestamp FROM MyTable
)并对每个组执行上述第三个查询。
在任何情况下,请确保您在三列item_id
,timestamp
和price
上都有一个索引,以获得良好的效果。