我编写了一个SQL例程来计算一天中每分钟的平均值,从5天数据开始(日期是例程的参数)并将结果插入另一个表中。这很长,我想知道是否有任何方法来优化它。
我需要用来计算平均值的值都在同一个表SiteReading中,所以要获得相同分钟的5个值,但是从不同的日期我加入表的子集以便那些天,所以小时和分钟匹配然后值最终在同一行。然后我在每一行上添加5个值并从中创建一个新表并将其插入存储这些平均值的Baseline表中。
这是例程:
CREATE PROCEDURE 'calc_baseline` (IN `input_site_id` int, IN `day1` varchar(12), IN `day2` varchar(12), IN `day3` varchar(12), IN `day4` varchar(12), IN `day5` varchar(12))
BEGIN
insert into Baseline
SELECT
site_id,
contract_id,
temp_time as timestamp,
(sr1value + sr2value + sr3value + sr4value + sr5value) / 5 as value,
programme
FROM
(SELECT
distinct concat(cast(hour(temp_time) as char), ':', cast(minute(temp_time) as char)) as hourminute,
SR.site_id as site_id,
value as sr1value,
temp_time,
S.contract_id as contract_id,
programme
FROM
SiteReading SR
join Site S ON SR.site_id = S.site_id
join Contract C ON S.contract_id = C.contract_id
where
temp_time like 'day1%'
and SR.site_id = input_site_id) sr1
join
(SELECT
concat(cast(hour(temp_time) as char), ':', cast(minute(temp_time) as char)) as hourminute,
value as sr2value
FROM
SiteReading
where
temp_time like 'day2%'
and site_id = input_site_id) sr2 ON sr1.hourminute = sr2.hourminute
join
(SELECT
concat(cast(hour(temp_time) as char), ':', cast(minute(temp_time) as char)) as hourminute,
value as sr3value
FROM
SiteReading
where
temp_time like 'day3%'
and site_id = input_site_id) sr3 ON sr1.hourminute = sr3.hourminute
join
(SELECT
concat(cast(hour(temp_time) as char), ':', cast(minute(temp_time) as char)) as hourminute,
value as sr4value
FROM
SiteReading
where
temp_time like 'day4%'
and site_id = input_site_id) sr4 ON sr1.hourminute = sr4.hourminute
join
(SELECT
concat(cast(hour(temp_time) as char), ':', cast(minute(temp_time) as char)) as hourminute,
value as sr5value
FROM
SiteReading
where
temp_time like 'day5%'
and site_id = input_site_id) sr5 ON sr1.hourminute = sr5.hourminute
limit 1440;
END//
DELIMITER ;
它正在阅读和写作的相关表格是:
- SiteReading:
CREATE TABLE `SiteReading` (
`site_id` int(11) NOT NULL,
`contract_id` int(11) DEFAULT NULL,
`temp_time` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`value` int(11) NOT NULL,
PRIMARY KEY (`site_id`,`temp_time`),
KEY `site_id` (`site_id`),
KEY `contract_id` (`contract_id`),
CONSTRAINT `SiteReading_ibfk_1` FOREIGN KEY (`site_id`) REFERENCES `Site` (`site_id`),
CONSTRAINT `SiteReading_ibfk_3` FOREIGN KEY (`contract_id`) REFERENCES `Contract` (`contract_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8$$
- 基线:
CREATE TABLE `Baseline` (
`site_id` int(11) NOT NULL,
`contract_id` int(11) NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`value` int(11) NOT NULL,
`programme` int(11) NOT NULL,
PRIMARY KEY (`site_id`,`timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8$$
因为我需要获得一些额外的值(site_id,contract_id,program)来存储在Baseline中,每行都是相同的,我想知道也许我应该以其他方式执行insert语句?问题是所有Baseline表的列都不能为空。
也许有人对此程序有任何其他意见 - 我是否需要为此例程定义一些其他参数,如ON DUPLICATE KEY UPDATE或其他一些常规相关事项?
感谢。
答案 0 :(得分:1)
SELECT
t1.site_id,
t1.contract_id,
t1.temp_time,
AVG(t2.value)
FROM
SiteReading AS t1
LEFT JOIN
SiteReading AS t2
ON
t1.site_id = t2.site_id
AND t2.datetime BETWEEN startdate AND enddate
AND HOUR(t1.temp_time) = HOUR(t2.temp_time)
AND MINUTE(t1.temp_time) = MINUTE(t2.temp_time)
WHERE
t1.temp_time BETWEEN startdate AND enddate
GROUP BY
t1.site_id,
t1.contract_id,
t1.temp_time
根本没有经过测试,但这样的事情可能对你有所帮助。 我做的优化:
1.因为我需要获得一些额外的值(site_id,contract_id,program)来存储在Baseline中,每行都是相同的,我想知道也许我应该以其他方式执行insert语句?问题是所有Baseline表的列都不能为空。
见#4
2.可能有人对此程序有任何其他意见 - 我是否需要为此例程定义一些其他参数,如ON DUPLICATE KEY UPDATE或其他一些常规相关事项?
我不确定我是否完全理解您的要求。您是否在较长时间内收集了多个5天基线?如果是这样,我不明白为什么你需要更新任何东西。如果某些temp_time重叠(即您在5天内每隔5天更频繁地运行您的过程),那么您可以将唯一ID或时间戳作为基准主键的一部分,以确定何时运行该过程以防止重复键在temp_time上。
我现在才知道你的日子可能不会连续。在这种情况下改变这些行:
AND t2.datetime BETWEEN startdate AND enddate
t1.temp_time BETWEEN startdate AND enddate
为:
AND DATE(t2.datetime) IN (day1, day2, day3, day4, day5)
DATE(t1.temp_time) IN (day1, day2, day3, day4, day5)
但是,这会产生问题,因为您现在必须在WHEREclause和ON条件下对sitereading进行全表扫描。 为避免这种情况,您可以考虑在存储数据集之前规范化数据集的时间间隔。例如,如果每天读取24 * 60个读数,那么每个temp_time间隔可以用1到1440的int表示,并且每天可以用1到365(366闰年)的int表示。然后在where和join子句中使用这些值。
答案 1 :(得分:0)
感谢您的帮助。现在我想起来,我最初写的查询有很多问题。首先,我对写入SiteReading的应用程序进行了一些更改,因此每分钟只有一行,并且时间戳在其秒数字段中始终为00 - 因此现在比较时间戳更容易(仅通过得到时间)。我已经以更好,更有效的方式重写了查询:
CREATE PROCEDURE `calc_baseline2`(IN `input_site_id` int, IN `day1` char(12), IN `day2` char(12), IN `day3` char(12), IN `day4` char(12), IN `day5` char(12))
BEGIN
DECLARE curr_date char(10) DEFAULT cast(date(CURRENT_DATE()) as char(10));
insert into Baseline
SELECT distinct
SR.site_id as site_id,
S.contract_id as contract_id,
concat(cast(date(CURRENT_DATE()) as char(10)), ' ',cast(time(temp_time) as char(8))) as timestamp,
sum(value) as value,
programme
FROM
SiteReading SR
join
Site S ON SR.site_id = S.site_id
join
Contract C ON S.contract_id = C.contract_id
where
(temp_time like concat(day1,'%')
or temp_time like concat(day2,'%')
or temp_time like concat(day3,'%')
or temp_time like concat(day4,'%')
or temp_time like concat(day5,'%'))
and SR.site_id = input_site_id
group by time(temp_time)
limit 1440;
END
再次,谢谢大家的帮助:)。