我遇到的情况是我在两个表之间执行连接,并且我需要一个表中的值作为连接中子查询的LIMIT因子。假设我有以下[极简化]表 -
data:
experiment_id | value
--------------|--------
1 | 2.5
1 | 2.6
1 | 4.5
1 | 2.3
1 | 3.5
1 | 2.8
2 | 2.3
2 | 1.2
2 | 1.1
2 | 3.6
2 | 3.8
2 | 4.1
2 | 7.9
2 | 4.2
2 | 1.0
data_clip:
experiment_id | clip_index
--------------|------------
1 | 3
2 | 5
我需要将每个实验的排序值相加到特定的clip_index,这在实验之间会有所不同。所以,我的结果表理想情况如下:
results:
experiment_id | sum
--------------|-------
1 | 7.6 # => 2.3 + 2.5 + 2.8
2 | 13.0 # => 1.0 + 1.1 + 1.2 + 2.3 + 3.6 + 3.8
通常,我会使用一些客户端脚本(ruby,python等)进行此计算,但我想尝试在db级别执行此操作。一些想象中的SQL可能看起来像这样(这个查询有各种各样的错误,我知道,但希望你能得到这个想法):
SELECT
T0.experiment_id as `id`,
(SELECT SUM(x.value) from
(SELECT value
FROM data
WHERE experiment_id = t0.experiment_id
ORDER BY value
LIMIT t0.clip_index ) as x) AS `sum`
FROM data_clip AS t0
几个问题:
WHERE
条件无法识别子查询外部的t0
表。我的问题基本上是如何使用SQL来完成两个表之间的变量限制和求和。我考虑过使用group_concat
和substring_index
来为每行隔离最多clip_index
的值,但接下来就是总结编号字符串("1.2,2.3,3.2"
)和服务器对group_concat
缓冲区大小的限制(可配置,但每个实验的值约为100k)。有什么想法吗?感谢。
答案 0 :(得分:1)
我猜你只需要在每个选中的值中包含一个行号,并按行数限制结果 这样的事情: (未经测试)
SELECT T0.experiment_id as `id`,
(SELECT SUM(x.value) from
(SELECT value,@rownum := @rownum + 1 AS rownum
FROM data
JOIN (SELECT @rownum := 0) r
WHERE experiment_id = t0.experiment_id
ORDER BY value
) AS x
WHERE x,rownum < t0.clip_index
) AS `sum`
FROM data_clip AS t0
答案 1 :(得分:0)
我认为当所有值都是正值时,这将有效。如果存在负值,则需要再增加一个级别。
SELECT experiment_id
, MIN(sumValue) - (MIN(cnt)-clip_id) * MIN(maxValue)
AS sumValue
FROM
( SELECT e.experiment_id
, e.clip_id
, COUNT(*) AS cnt
, SUM(d2.value) AS sumValue
, d.value AS maxValue
FROM experiment AS e
JOIN data AS d
ON d.experiment_id = e.experiment_id
JOIN data AS d2
ON d2.experiment_id = e.experiment_id
AND d2.value <= d.value
GROUP BY e.experiment_id
, d.id --- table's `data` Primary Key
HAVING COUNT(*) >= e.clip_id
) AS grp
GROUP BY experiment_id