从部分重复的数据中选择首选行

时间:2011-08-12 15:56:18

标签: mysql sql having

我有以下查询:

select 
  mb.id as meter_id
  ,ds.mydate as mydate
  ,mb.name as metergroup
  ,sum(ms.stand) as measured_cum_value 
  ,me.name as energy_medium
  ,e.name as unit_of_measure
  ,min(ms.source) as source
  ,count(*) as debugcount
FROM datumselect ds                            <<-- mem table with dates to query.
INNER JOIN metergroup mb ON (mb.building_id = 1)   
INNER JOIN meter m ON (m.metergroup_id = mb.id)  <<-- meters are grouped
INNER JOIN medium me ON (me.id = mb.medium_id)   <<-- lookuptables for normalization
INNER JOIN unit e ON (e.id = mb.unit_id)         <<-- ditto
INNER JOIN meterstand ms ON (ms.meter_id = m.id AND ms.mydate = ds.mydate)
group by ds.mydate, mb.id, ms.source  <<-- this is prob. broken.
having source = MIN(ms.source)   <<-- this `having` does not work !
ORDER BY mb.id, ds.mydate 

我从下表中选择:

CREATE TABLE meterstand(
  id INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
  meter_id INT(11) UNSIGNED NOT NULL,
  mydate DATETIME NOT NULL,
  stand DECIMAL(16, 5) NOT NULL,
  source ENUM('calculated', 'read', 'manual') NOT NULL DEFAULT 'read',
  PRIMARY KEY (id),
  INDEX FK_meterstand_meter_id (meter_id),
  UNIQUE INDEX UK_meterstand (datum, meter_id, bron),
  CONSTRAINT FK_meterstand_meter_id FOREIGN KEY (meter_id)
  REFERENCES vaanstermeters.meter (id) ON DELETE RESTRICT ON UPDATE CASCADE
)
ENGINE = INNODB
AUTO_INCREMENT = 181
AVG_ROW_LENGTH = 105
CHARACTER SET latin1
COLLATE latin1_swedish_ci;

给出以下数据的简单查询将是:

SELECT 
  meter_id
  , mydate
  , sum(stand)
  , count(*) as debugcount
FROM meterstand
WHERE mydate IN (list_of_dates_im_interested_in)
GROUP BY meter_id, my_date
HAVING the_best(source) 

鉴于当前数据debugcount应始终为1,但如果上述查询中的某个组中有多个米,debugcount应该是该组中的米数。

我可以选择不同来源的价值观,我有:
- manual来源,这是金色的;
- 来自数据源的read来源,某处建筑物内的仪表;
- calculated数据,内插以弥补缺失的数据。

具有相同 meter_id + mydate 的单个数据点可以有多个来源。
该查询应优先于manual read来源,如果没有其他数据可用,则仅选择calculated数据。

以下是meterstand中的数据示例:

id  meter_id mydate stand       source
------------------------------------------------------
179 6   1-12-2010   94,75886    calculated
180 7   1-12-2010   256,02618   calculated
164 7   1-1-2011    285,41800   manual <<--- Query should only consider this row.
183 7   1-1-2011    0,00000     read   <<-- and forget about this one

用于选择最佳数据点的正确查询语法是什么?

1 个答案:

答案 0 :(得分:1)

从它的外观来看,MySQL将枚举的排序顺序定义为它们在定义中列出的顺序。鉴于您已将订单定义为与其相反的顺序,我相信以下内容将按预期工作(尽管没有要测试的实例):

SELECT * 
FROM meterstand as a
JOIN (SELECT meter_id, mydate, MAX(source) as source
      FROM meterstand
      GROUP BY meter_id, mydate) as b
ON b.meter_id = a.meter_id
AND b.mydate = a.mydate
AND b.source = a.source

(假设[meter_id,mydate,source]当然是唯一的。)

看起来确实有一个错误导致枚举按字符串值排序(根据字符串,这根本不会对你有所帮助)。
如果它存在(或者你想要更多地控制使用顺序),你可能想要定义一个表:

Meter_Reading_Type
========================
Id   Description   Priority
1    Manual        10
2    Calculated    30
3    Read          20

然后将其作为fk引用并按(最小)优先级排序。