Question

我没有关系数据库的经验，在我编写C ++代码来实现我的问题的解决方案之前，我想检查使用数据库是否会提供一个简单的解决方案。这是我的问题：

我有一组物理样本和简单的测量，可以在每个样本上产生实数结果。在所有可用样品上多次进行测量（定期添加新样品），结果以SAMPLE_ID和RESULT列的形式存储在数据库中。每个测量值都存储为包含其结果的新表（表名称标识特定测量值）。或者，如果更有意义，每个测量都会在全局表中添加一列，并显示当前结果（列名称标识特定的度量）。我将通过C ++ API创建表并以相同的方式接收报告（查询结果）。我需要至少两个报告（简单的ASCII文本很好）：

所有样本的最佳（最高）结果列表。
对于一小部分测量，最近测量结果比任何前一测量结果更差（低于）的样本列表（在所选子集中）。

生成每个报告的数据库查询是什么？

Answer 1

是的，数据库可以很好地工作。

您需要一个列来存储日期或时间戳，以便区分样本结果。没有这样的专栏，“最近的测量”毫无意义。（表中行的顺序基本上没有意义。）

你可能不需要任何人来开发前端;尝试手动输入数据或通过dbms的批量加载程序加载CSV文件。（每个现代的dbms都有一个;它们的名称各不相同。）

您可能不需要报告专家来构建报告。查询输出通常是研究中所需要的。

有些查询很简单，有些查询可能并不简单，但至少可以直截了当。下面的代码在PostgreSQL中进行了测试，但是应该适用于任何支持公用表表达式和行构造函数的dbms。

create table measurements (
  sample_id integer not null,
  measurement_time timestamp not null,
  measurement real not null check(measurement >= 0 and measurement <= 30),
  primary key (sample_id, measurement_time)
);

insert into measurements values 
(1, '2012-02-02 08:03', 13.89),
(2, '2012-02-02 00:00', 13.86),
(1, '2012-02-02 00:25', 25.07),
(1, '2012-02-02 03:32', 25.38),
(1, '2012-02-02 05:47', 16.64),
(2, '2012-02-02 08:03', 16.16),
(2, '2012-02-02 07:25', 25.85),
(3, '2012-02-02 08:03', 14.78),
(3, '2012-02-02 09:29', 17.08),
(3, '2012-02-02 10:31', 13.41),
(4, '2012-02-02 12:38', 20.98),
(5, '2012-02-02 08:03', 25.00),
(5, '2012-02-02 14:02', 16.27),
(5, '2012-02-02 03:32', 12.10),
(5, '2012-02-02 17:47', 21.34),
(6, '2012-02-02 18:32', 17.16),
(6, '2012-02-02 18:33', 21.59),
(7, '2012-02-02 20:07', 21.47),
(8, '2012-02-02 21:58', 11.50),
(8, '2012-02-02 22:53', 21.01);

-- All samples with their highest measurement.
select sample_id, max(measurement)
from measurements
group by sample_id
order by sample_id;

-- Most recent measurement lower than any preceeding measurement.
-- Another way of saying this is that the max() measurement isn't the 
-- latest measurement.
with max_measurements as (
  select m.*
  from measurements m
  inner join (select sample_id, max(measurement) measurement
              from measurements
              group by sample_id) max_m 
      on max_m.sample_id = m.sample_id 
     and max_m.measurement = m.measurement
),
latest_measurement as (
  select m.*
  from measurements m
  inner join (select sample_id, max(measurement_time) measurement_time
              from measurements
              group by sample_id) max_m 
      on max_m.sample_id = m.sample_id 
     and max_m.measurement_time = m.measurement_time
)
select m.* 
from max_measurements m
where row(m.sample_id, m.measurement_time) not in (select sample_id, measurement_time 
                                                   from latest_measurement);

使用关系数据库跟踪实验数据的演变

1 个答案: