I'm trying to learn SQL and database design and need some help with selecting a good design of my database in this case. I’m using C# and MySQL.
My input data in this lesson consist of energy meters, all with a unique identification number and every meter delivers one value per hour. I have data from 2013 and onward, and this will continue for a non-specified future. Best guess is 5 years ahead. There are roughly 25 000 meters so there will be 25e3 * 24 = 600 000 data points a day. I get this data once a day via file. The number of meters will change in a slow pace, so there will be around 500 changes per year, adding and removing meters. As a bonus I would like to know when the value was added to the database to calculate some performance-index of the collection system. So this is the input data for each meter:
Every meter delivers one type of data so I can store a table with the type of data, so the data itself will consist of anonymous decimal values. This is where my problem begins. I have tried some different design approaches:
All solutions above leads to quite slow performance when adding data to the database.
If I search Stack Overflow and elsewhere for database design with large number of columns I will always find the answer “Normalize!”, but I do not know how in my case because my novice experience. I have a unique value (valuetime) and I have unique meter ID, this is why I call my data rectangular.
Can someone please lead me to the right path?
答案 0 :(得分:0)
For your inputted data:
Meter Table:
ID int PK IDENTITY(1, 1)
MeterName varchar
ReadingsTable:
ID int PK IDENTITY(1, 1)
MeterID int FK
Value decimal
TimeStamp datetime
DateAdded date
You should populate this with an ETL - make an SSIS package or something. Definitely better than a C# app, in my opinion.
Next, you can make aggregation tables:
DailyAggTable:
ID int PK IDENTITY(1, 1)
MeterID int FK
SumOfValue decimal
Date date
You can populate this after your ETL. You can make weekly, monthly, quarterly, yearly, etc. agg tables and schedule their population accordingly. This will improve reporting performance.
答案 1 :(得分:0)
以Stan Shaw的答案为基础......
如果数据是CSV文件,请每晚使用LOAD DATA
。您应该加载到临时表中,按摩数据,然后复制到真实表中。可能不需要任何C#代码。
DateAdded
似乎有些无用,并且使表格变得杂乱无章。完全删除,或构建另一个表来记录上传。
不要打扰主桌上的ID; (MeterID,Timestamp)是'自然'PRIMARY KEY
。再次,这节省了空间。
我只会在一个摘要表中构建每日摘要行。该表可能足够快以处理每周/每月查询。只有在速度不够快的情况下,才应考虑摘要摘要。