我有三个表,我需要根据公共字段加入数据。
示例伪表defs:
barometer_log (设备,压力浮动,sampleTime时间戳)
temperature_log (device int,temperature float,sampleTime timestamp)
magnitude_log (device int,magnitude float,utcTime timestamp)
每个表最终将包含数十亿行,但目前每行包含大约500,000行。
我需要能够将表中的数据(FULL JOIN)组合起来,以便将 sampleTime 合并为一列(COALESE),以便为我提供以下行: 设备,采样时间,压力,温度,幅度
我需要能够通过指定设备以及开始和结束日期来查询数据,例如 选择....其中,设备= 1000,“2011-10-11”和“2011-10-17”之间的采样时间
我尝试了使用RIGHT和LEFT连接的不同UNION ALL技术 正如MySql full join (union) and ordering on multiple date columns和MySql full join (union) and ordering on multiple date columns中所建议的那样,但查询耗时太长,我必须在运行数小时后停止它或抛出有关临时文件大小的错误。 查询三个表并在可接受的时间范围内合并输出的最佳方法是什么?
这是建议的完整表定义。 注意:尚未包含设备表。
magnitude_log
CREATE TABLE magnitude_log (
device int(11) NOT NULL,
magnitude float not NULL,
sampleTime timestamp NOT NULL,
PRIMARY KEY (device,sampleTime),
CONSTRAINT magnitudeLog_device
FOREIGN KEY (device)
REFERENCES device (id)
ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
barometer_log
CREATE TABLE barometer_log (
device int(11) NOT NULL,
pressure float not NULL,
sampleTime timestamp NOT NULL,
PRIMARY KEY (device,sampleTime),
CONSTRAINT barometerLog_device
FOREIGN KEY (device)
REFERENCES device (id)
ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
temperature_log
CREATE TABLE temperature_log (
device int(11) NOT NULL,
sampleTime timestamp NOT NULL,
temperature float default NULL,
PRIMARY KEY (device,sampleTime),
CONSTRAINT temperatureLog_device
FOREIGN KEY (device)
REFERENCES device (id)
ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
答案 0 :(得分:1)
首先,在所需的时间段内获取所有3个表中(device, sampleTime)
的所有组合:
-------- Q --------
SELECT device, sampleTime
FROM magnitude_log
WHERE device = 1000
AND sampleTime >= '2011-10-11'
AND sampleTime < '2011-10-18'
UNION
SELECT device, sampleTime
FROM barometer_log
WHERE device = 1000
AND sampleTime >= '2011-10-11'
AND sampleTime < '2011-10-18'
UNION
SELECT device, sampleTime
FROM temperature_log
WHERE device = 1000
AND sampleTime >= '2011-10-11'
AND sampleTime < '2011-10-18'
然后将其用于LEFT JOIN
3个表:
SELECT
q.device
, q.sampleTime
, b.pressure
, t.temperature
, m.magnitude
FROM
( Q ) AS q
LEFT JOIN
( SELECT *
FROM magnitude_log
WHERE device = 1000
AND sampleTime >= '2011-10-11'
AND sampleTime < '2011-10-18'
) AS m
ON (m.device, m.sampleTime) = (q.device, q.sampleTime)
LEFT JOIN
( SELECT *
FROM barometer_log
WHERE device = 1000
AND sampleTime >= '2011-10-11'
AND sampleTime < '2011-10-18'
) AS b
ON (b.device, b.sampleTime) = (q.device, q.sampleTime)
LEFT JOIN
( SELECT *
FROM temperature_log_log
WHERE device = 1000
AND sampleTime >= '2011-10-11'
AND sampleTime < '2011-10-18'
) AS t
ON (t.device, t.sampleTime) = (q.device, q.sampleTime)
您拥有的时间越长,查询与UNION
子查询争用的时间就越长。您可以考虑将Q
作为一个单独的表,可能通过触发器使用其他三个表中的唯一(device, sampleTime)
组合填充它。
答案 1 :(得分:0)
假设表格device
包含您并不真正需要正确完全加入的所有设备,您只需要加入device
上的其他表格并在示例时间分组,如下所示:
SELECT
d.id AS device,
COALESCE(m.sampleTime, b.sampleTime, t.sampleTime) AS sampleTime,
m.magnitude,
b.pressure,
t.temperature
FROM device AS d
LEFT JOIN magnitude_log AS m ON d.id = m.device
LEFT JOIN barometer_log AS b ON d.id = b.device
LEFT JOIN temperature_log AS t ON d.id = t.device
WHERE d.id = 1000
GROUP BY device, sampleTime
HAVING sampleTime BETWEEN '2011-10-11' AND '2011-10-17'
然而,这可能会很慢,因为它会在时间跨度上实际匹配之前进行分组,但如果单个设备本身不会有数百万行,那么它应该不是问题。但是,如果是,我建议将sampleTime放在每个连接上:
SELECT
d.id AS device,
COALESCE(m.sampleTime, b.sampleTime, t.sampleTime) AS sampleTime,
m.magnitude,
b.pressure,
t.temperature
FROM device AS d
LEFT JOIN magnitude_log AS m ON d.id = m.device AND m.sampleTime BETWEEN '2011-10-11' AND '2011-10-17'
LEFT JOIN barometer_log AS b ON d.id = b.device AND b.sampleTime BETWEEN '2011-10-11' AND '2011-10-17'
LEFT JOIN temperature_log AS t ON d.id = t.device AND t.sampleTime BETWEEN '2011-10-11' AND '2011-10-17'
WHERE d.id = 1000
GROUP BY device, sampleTime
HAVING sampleTime IS NOT NULL
希望有所帮助!
答案 2 :(得分:0)
如果您要查询一小段时间范围和大量设备,您可能需要考虑反转PK索引来实现它(timeRange,device)。
您可能需要设备上的辅助索引或(device,timeRange)。