我正在使用MySQL来保存我在HPC群集上运行的大量模拟数据。每个模拟在表中都有自己的条目,还有第二个表保存模拟时间步结果数据。时间步结果数据表非常大(数十到数亿行)。表格如下所示:
表格:模拟
id descriptor notes
1 SIM1 notes here...
2 SIM2 SIM2 Notes...
... ... ...
8643 SIM8643 SIM8643 Notes...
表格:simulations_ts
id simulation_id step data_value
1 1 1 0.05
2 1 2 0.051
... ... ... ...
1983 1 1983 0.253
1984 2 1 0.043
... ... ... ...
59345435 8643 2832 0.067
我希望能够有效地返回下表:
simulation_id first_ts_id last_ts_id num_steps
1 1 1983 1983
2 1984 2938434 2052
... ... ... ...
8643 12835283 59345435 2832
我知道我可以执行以下查询:
SELECT
simulation_id
MIN(step) AS first_step,
MAX(step) AS last_step,
COUNT(id) AS num_steps
FROM
simulations_ts
GROUP BY
simulation_id
ORDER BY
simulation_id ASC
并且有一些方法可以进行子查询来为一个聚合提取相应的id,但是我没有找到任何示例来为两个聚合函数提取相应的id。这是否可以通过有效的方式在单个查询中完成,或者我最好是单独执行并执行min lookup和max lookup?
答案 0 :(得分:2)
SELECT simulation_id, first.id as first_ts_id, last.id as last_ts_id, num_steps
FROM (SELECT simulation_id, MIN(step) minstep, MAX(step) maxstep, COUNT(*) num_steps
FROM simulations_ts
GROUP BY simulation_id) AS g
JOIN simulations_ts first ON first.simulation_id = g.simulation_id AND first.step = g.minstep
JOIN simulations_ts last ON last.simulation_id = g.simulation_id AND last.step = g.maxstep
答案 1 :(得分:1)
我认为这就是你所追求的。请注意,我只显示simulations_ts的first_dim_id
和last_dim_id
别名中的id列,但您当然可以显示该表中的其他列。
SELECT
main.simulation_id,
first_step,
first_sim.id as first_sim_id,
last_step,
last_sim.id as last_sim_id
FROM
(SELECT
simulation_id,
MIN(step) AS first_step,
MAX(step) AS last_step,
COUNT(id) AS num_steps
FROM
simulations_ts
GROUP BY
simulation_id) as main
JOIN simulations_ts first_sim
ON main.simulation_id = first_sim.simulation_id
AND main.first_step = first_sim.step
JOIN simulations_ts last_sim
ON main.simulation_id = first_sim.simulation_id
AND main.last_step = last_sim.step
我从您的原始查询开始,然后在sim ID和min / max步骤上将其加回simulations_ts
。