我正在尝试使用Oracle 11g(开发中的11.1,生产中的11.2)进行数值分析,特别是在具有三个感兴趣的列的表上进行线性插值:时间戳,设备ID和值。
值列保存来自设备的数据(带有id deviceid),在时间戳中给出的时间。例如,这是虚假的数据,但它提出了这个想法:
time | deviceid | value
----------------|------------|-----------
01:00:00.000 | 001 | 1.000
01:00:01.000 | 001 | 1.030
01:00:02.000 | 001 | 1.063
01:00:00.050 | 002 | 553.10
01:00:01.355 | 002 | 552.30
01:00:02.155 | 002 | 552.43
来自设备001的时间戳与设备002的时间戳不匹配,但我需要将设备001和002中的值放在一行中,其中一个时间戳与设备001的时间戳匹配。我想要结束就是这样:
time | device 001 | device 002
----------------|--------------|------------
01:00:00.000 | 1.000 | null
01:00:01.000 | 1.030 | 552.520
01:00:02.000 | 1.063 | 552.405
其中,设备002的值基于在设备001的每个时间戳的任一侧上的两个最接近的时间戳处收集的设备002的值进行线性内插。 出现null是因为我在01:00:00.000的任一侧没有设备002的两个时间戳,我不想推断该值。
根据我的理解,我可以使用percentile_cont来做到这一点,但我不理解我在网上看过的例子。例如,percentile_cont使用的百分位数来自何处?
提前感谢您的帮助!
答案 0 :(得分:3)
我不确定你是如何使用PERCENTILE_CONT
进行所要求的插值的,但借助不同的分析功能,你可以达到你想要的效果。
首先,我们将创建以下函数,将INTERVAL DAY TO SECOND
值转换为秒:
CREATE OR REPLACE FUNCTION intvl_to_seconds(
p_interval INTERVAL DAY TO SECOND
) RETURN NUMBER DETERMINISTIC
AS
BEGIN
RETURN EXTRACT(DAY FROM p_interval) * 24*60*60
+ EXTRACT(HOUR FROM p_interval) * 60*60
+ EXTRACT(MINUTE FROM p_interval) * 60
+ EXTRACT(SECOND FROM p_interval);
END;
/
使用此功能,我们可以使用如下查询:
SELECT d1.time,
d1.value AS value1,
q2.prev_value + intvl_to_seconds(d1.time - q2.prev_time) * (q2.next_value - q2.prev_value)/intvl_to_seconds(q2.next_time - q2.prev_time) AS value2
FROM devices d1
LEFT OUTER JOIN (SELECT d2.time AS prev_time,
d2.value AS prev_value,
LEAD(d2.time, 1) OVER (ORDER BY d2.time) AS next_time,
LEAD(d2.value, 1) OVER (ORDER BY d2.time) AS next_value
FROM devices d2
WHERE d2.deviceid = 2) q2
ON d1.time BETWEEN q2.prev_time AND q2.next_time
WHERE d1.deviceid = 1;
我上面提到了你的数据,把时间戳的日期组件设置为今天,当我运行上面的查询时,我得到了以下结果:
TO_CHAR(D1.TIME) VALUE1 VALUE2 ------------------------------------- ---------- ---------- 09-SEP-11 01.00.00.000000 1 09-SEP-11 01.00.01.000000 1.03 552.517625 09-SEP-11 01.00.02.000000 1.063 552.404813
(我在TO_CHAR
附近添加了d1.time
以减少SQL * Plus中过多的间距。)
如果您使用DATE
而非TIMESTAMP
s,则不需要此功能:您只需减去日期。
答案 1 :(得分:0)
我正在使用@Luke Woodward查询的修改版本:
SELECT d1.time,
d1.value AS value1,
q2.prev_value +
(EXTRACT( SECOND FROM (d1.time - q2.prev_time)) +
EXTRACT( MINUTE FROM (d1.time - q2.prev_time)) * 60 )
* (q2.next_value - q2.prev_value)/
(EXTRACT ( SECOND FROM (q2.next_time - q2.prev_time)) +
EXTRACT ( MINUTE FROM (q2.next_time - q2.prev_time)) * 60) AS value2
FROM devices d1
LEFT OUTER JOIN (SELECT d2.time AS prev_time,
d2.value AS prev_value,
LEAD(d2.time, 1) OVER (ORDER BY d2.time) AS next_time,
LEAD(d2.value, 1) OVER (ORDER BY d2.time) AS next_value
FROM devices d2
WHERE d2.deviceid = 2
and time between '20100914 000000' and '20100915 000000'
) q2
ON d1.time BETWEEN q2.prev_time AND q2.next_time
WHERE d1.deviceid = 1;
但即使在日期范围内有设备2的数据,插值也始终为空。
注意,我必须在q2中为查询添加日期范围,这可能是正常连接丢失外部数据的原因。
如果我使用普通连接,则不会为插值数据获取空值,但在使用普通连接时,我会丢失设备2(q2中的插值设备)端点之外的设备1的数据。建议?
答案 2 :(得分:0)
日期范围的最终解决方案:
SELECT
d1.time,
d1.value AS value1,
q2.prev_value +
(EXTRACT( SECOND FROM (d1.time - q2.prev_time)) +
EXTRACT( MINUTE FROM (d1.time - q2.prev_time)) * 60 )
* (q2.next_value - q2.prev_value)/
(EXTRACT ( SECOND FROM (q2.next_time - q2.prev_time)) +
EXTRACT ( MINUTE FROM (q2.next_time - q2.prev_time)) * 60
) AS value2
FROM devices d1
LEFT OUTER JOIN (
SELECT d2.time AS prev_time,
d2.value AS prev_value,
LEAD(d2.time, 1) OVER (ORDER BY d2.time) AS next_time,
LEAD(d2.value, 1) OVER (ORDER BY d2.time) AS next_value
FROM devices d2
WHERE d2.deviceid = 2
AND time BETWEEN '20100914 000000' AND '20100915 000000'
) q2
ON d1.time BETWEEN q2.prev_time AND q2.next_time
WHERE d1.deviceid = 1
AND time BETWEEN '20100914 000000' AND '20100915 000000';