Question

我正在使用MySQL来保存我在HPC群集上运行的大量模拟数据。每个模拟在表中都有自己的条目，还有第二个表保存模拟时间步结果数据。时间步结果数据表非常大（数十到数亿行）。表格如下所示：

表格：模拟

id      descriptor  notes 
1       SIM1        notes here...
2       SIM2        SIM2 Notes...
...     ...         ...
8643    SIM8643     SIM8643 Notes...

表格：simulations_ts

id         simulation_id    step        data_value
1          1                1           0.05
2          1                2           0.051
...        ...              ...         ...
1983       1                1983        0.253
1984       2                1           0.043
...        ...              ...         ...
59345435   8643             2832        0.067

我希望能够有效地返回下表：

simulation_id    first_ts_id     last_ts_id  num_steps
1                1               1983        1983
2                1984            2938434     2052
...              ...             ...         ...
8643             12835283        59345435    2832

我知道我可以执行以下查询：

SELECT
   simulation_id
   MIN(step) AS first_step,
   MAX(step) AS last_step,
   COUNT(id) AS num_steps
FROM
   simulations_ts
GROUP BY
   simulation_id
ORDER BY
   simulation_id ASC

并且有一些方法可以进行子查询来为一个聚合提取相应的id，但是我没有找到任何示例来为两个聚合函数提取相应的id。这是否可以通过有效的方式在单个查询中完成，或者我最好是单独执行并执行min lookup和max lookup？

Answer 1

SELECT simulation_id, first.id as first_ts_id, last.id as last_ts_id, num_steps
FROM (SELECT simulation_id, MIN(step) minstep, MAX(step) maxstep, COUNT(*) num_steps
      FROM simulations_ts
      GROUP BY simulation_id) AS g
JOIN simulations_ts first ON first.simulation_id = g.simulation_id AND first.step = g.minstep
JOIN simulations_ts last ON last.simulation_id = g.simulation_id AND last.step = g.maxstep

Answer 2

我认为这就是你所追求的。请注意，我只显示simulations_ts的first_dim_id和last_dim_id别名中的id列，但您当然可以显示该表中的其他列。

SELECT
   main.simulation_id,
   first_step,
   first_sim.id as first_sim_id,
   last_step,
   last_sim.id as last_sim_id
FROM
   (SELECT
       simulation_id,
       MIN(step) AS first_step,
       MAX(step) AS last_step,
       COUNT(id) AS num_steps
    FROM
       simulations_ts
    GROUP BY
       simulation_id) as main
    JOIN simulations_ts first_sim
         ON main.simulation_id = first_sim.simulation_id
            AND main.first_step = first_sim.step
    JOIN simulations_ts last_sim
         ON main.simulation_id = first_sim.simulation_id
            AND main.last_step = last_sim.step

我从您的原始查询开始，然后在sim ID和min / max步骤上将其加回simulations_ts。

如何在mysql中获得相应的最大和最小ID？

2 个答案: