我的数据库中有一个表格,用这种方式记录来自几个传感器的读数:
CREATE TABLE [test].[readings] (
[timestamp_utc] DATETIME2(0) NOT NULL, -- 48bits
[sensor_id] INT NOT NULL, -- 32 bits
[site_id] INT NOT NULL, -- 32 bits
[reading] REAL NOT NULL, -- 64 bits
PRIMARY KEY([timestamp_utc], [sensor_id], [site_id])
)
CREATE TABLE [test].[sensors] (
[sensor_id] int NOT NULL ,
[measurement_type_id] int NOT NULL,
[site_id] int NOT NULL ,
[description] varchar(255) NULL ,
PRIMARY KEY ([sensor_id], [site_id])
)
我希望能够轻松地从所有这些读数中得出统计数据。
我想做的一些问题:
Get me all readings for site_id = X between date_hour1 and date_hour2
Get me all readings for site_id = X and sensor_id in <list> between date_hour1 and date_hour2
Get me all readings for site_id = X and sensor measurement type = Z between date_hour1 and date_hour2
Get me all readings for site_id = X, aggregated (average) by DAY between date_hour1 and date_hour2
Get me all readings for site_id = X, aggregated (average) by DAY between date_hour1 and date_hour2 but in UTC+3
(这应该会产生与先前查询不同的结果,因为现在天数的开始和结束都会移动3小时)
Get me min, max, std, mean for all readings for site_id = X between date_hour1 and date_hour2
到目前为止,我一直在使用Java查询数据库并在本地执行所有这些处理。但这最终有点慢,代码在编写和维护时仍然很乱(太多的cicles,执行重复任务的通用函数,大型/冗长的代码库等)...
更糟糕的是,表readings
是巨大的(因此主键的重要性,也是性能索引),也许我应该使用TimeSeries数据库(有什么好的) ?)。我正在使用SQL Server。
最好的方法是什么?我觉得我正在重新发明轮子,因为所有这些都是一个分析应用......
我知道这些查询听起来很简单,但当你尝试对所有这些进行参数化时,你最终会得到一个像这样的怪物:
-- Sums all device readings, returns timestamps in localtime according to utcOffset (if utcOffset = 00:00, then timestamps are in UTC)
CREATE PROCEDURE upranking.getSumOfReadingsForDevices
@facilityId int,
@deviceIds varchar(MAX),
@beginTS datetime2,
@endTS datetime2,
@utcOffset varchar(6),
@resolution varchar(6) -- NO, HOURS, DAYS, MONTHS, YEARS
AS BEGIN
SET NOCOUNT ON -- http://stackoverflow.com/questions/24428928/jdbc-sql-error-statement-did-not-return-a-result-set
DECLARE @deviceIdsList TABLE (
id int NOT NULL
);
DECLARE @beginBoundary datetime2,
@endBoundary datetime2;
SELECT @beginBoundary = DATEADD(day, -1, @beginTS);
SELECT @endBoundary = DATEADD(day, 1, @endTS);
-- We shift sign from the offset because we are going to convert the zone for the entire table and not beginTS endTS themselves
SELECT @utcOffset = CASE WHEN LEFT(@utcOffset, 1) = '+' THEN STUFF(@utcOffset, 1, 1, '-') ELSE STUFF(@utcOffset, 1, 1, '+') END
INSERT INTO @deviceIdsList
SELECT convert(int, value) FROM string_split(@deviceIds, ',');
SELECT SUM(reading) as reading,
timestamp_local
FROM (
SELECT reading,
upranking.add_timeoffset_to_datetime2(timestamp_utc, @utcOffset, @resolution) as timestamp_local
FROM upranking.readings
WHERE
device_id IN (SELECT id FROM @deviceIdsList)
AND facility_id = @facilityId
AND timestamp_utc BETWEEN @beginBoundary AND @endBoundary
) as innertbl
WHERE timestamp_local BETWEEN @beginTS AND @endTS
GROUP BY timestamp_local
ORDER BY timestamp_local
END
GO
这是一个查询,它接收站点ID(在本例中为facilityId),传感器ID列表(本例中为deviceIds),开始和结束时间戳,然后是字符串中的UTC偏移量,如&# 34 + XX:XX&#34;或者&#34; -xx:xx&#34;,以分辨率结束,该分辨率基本上说明SUM将如何聚合结果(考虑UTC偏移)。
由于我正在使用 Java ,乍一看我可以使用Hibernate或其他东西,但我觉得Hibernate并不适合这些类型的查询。
答案 0 :(得分:1)
您的结构乍一看看起来不错,但查看您的查询会让我觉得您可能需要尝试调整。性能从来都不是一个容易的主题,并且要找到一个适合所有答案的#34;并不容易。这里有一些注意事项:
[sensor_id], [site_id]
创建索引。readings
很大,那么请考虑使用某种分区策略。查看MSSQL documentation