在时间序列数据表中添加缺少的日期记录

时间:2018-10-26 17:25:17

标签: mysql sql

我正在使用一个包含工作日数据的表。该数据几乎与一天中的每日余额有关。数据如下:

ID  Name        Some Val    Other Val   Date

10  Somebody    33001.93    33001.93    2018-10-01
10  Somebody    33481.93    33481.93    2018-10-02
10  Somebody    33001.93    33001.93    2018-10-03
10  Somebody    33582.76    33582.76    2018-10-04
10  Somebody    33582.73    33582.79    2018-10-05
------- Missing row for 2018-10-06 ---------------
------- Missing row for 2018-10-07 ---------------
10  Somebody    33582.76    33582.76    2018-10-08
------- Missing row for 2018-10-09 ---------------
10  Somebody    33462.76    33462.76    2018-10-10

我的任务是计算平均每日余额(每天结束时的总余额/总天数)。为了进行计算,我需要确保我整天都有数据。为此,最后一行需要替换丢失的数据。

我的需要是

ID  Name        Some Val    Other Val   Date

10  Somebody    33001.93    33001.93    2018-10-01
10  Somebody    33481.93    33481.93    2018-10-02
10  Somebody    33001.93    33001.93    2018-10-03
10  Somebody    33582.76    33582.76    2018-10-04
10  Somebody    33582.73    33582.79    2018-10-05    
10  Somebody    33582.73    33582.79    2018-10-06
10  Somebody    33582.73    33582.79    2018-10-07    
10  Somebody    33582.76    33582.76    2018-10-08
10  Somebody    33382.76    33582.76    2018-10-09
10  Somebody    33462.76    33462.76    2018-10-10

本质上,第5行写入丢失的第6和7行,第8行写入第9行。

如果只是周末遗失的记录,我会得到部分解决方案。

select ID, Name, val1, val2, date from t
union all
select id, name, val1, val2, date + interval 1 day from t where dayofweek(date) = 6
union all
select id, name, val1, val2, date + interval 2 day from t where dayofweek(date) = 6
;

此部分解决方案在仅缺少周末记录的假设下工作。通过复制从星期五到星期六和星期日的数据来创建两个新表。最后,将所有三个数据集结合在一起。

如果工作周(例如公共假期)内缺少数据,则解决方案将失败,因此仅填充第6行和第7行。第9行仍然为空。

我如何找到丢失的记录,用最后的记录信息填充它们,从而完成时间序列?我是SQL的新手,但不是编程人员。使用正确的指针,我将能够提出解决方案。有人只是建议我如何解决这个问题。

我正在使用的MySQL版本是:

mysql  Ver 14.14 Distrib 5.7.24, for Linux (x86_64) using  EditLine wrapper

1 个答案:

答案 0 :(得分:1)

如果您的mysql支持cte recursive,则可以尝试使用它制作日历表。

然后执行outer join并使用case when进行子查询

模式(MySQL v8.0)

CREATE TABLE T(
   ID int,
   Name varchar(50),
   SomeVal float,   
   OtherVal float,   
   `Date` date
);




insert into T values (10,'Somebody',33001.93,33001.93,'2018-10-01');
insert into T values (10,'Somebody',33481.93,33481.93,'2018-10-02');
insert into T values (10,'Somebody',33001.93,33001.93,'2018-10-03');
insert into T values (10,'Somebody',33582.76,33582.76,'2018-10-04');
insert into T values (10,'Somebody',33582.73,33582.79,'2018-10-05');
insert into T values (10,'Somebody',33582.76,33582.76,'2018-10-08');
insert into T values (10,'Somebody',33462.76,33462.76,'2018-10-10');

查询#1

WITH recursive CTE as(
  SELECT MIN(Date) minDt,MAX(Date) maxDt
  FROM T
  UNION ALL
  SELECT date_add(minDt,INTERVAL 1 DAY),maxDt
  FROM CTE
  WHERE minDt < maxDt
)

SELECT  
    CASE WHEN ID IS NULL THEN (SELECT ID 
                            FROM T tt 
                            WHERE tt.Date < t1.minDt
                            ORDER BY tt.Date DESC
                            LIMIT 1)  
    ELSE ID END ID,
    CASE WHEN Name IS NULL THEN (SELECT Name 
                            FROM T tt 
                            WHERE tt.Date < t1.minDt
                            ORDER BY tt.Date DESC
                            LIMIT 1) 
    ELSE Name END Name,
    CASE WHEN SomeVal IS NULL THEN (SELECT SomeVal 
                            FROM T tt 
                            WHERE tt.Date < t1.minDt
                            ORDER BY tt.Date DESC
                            LIMIT 1) 
    ELSE SomeVal END SomeVal,
    CASE WHEN OtherVal IS NULL THEN (SELECT OtherVal 
                            FROM T tt 
                            WHERE tt.Date < t1.minDt
                            ORDER BY tt.Date DESC
                            LIMIT 1) 
    ELSE OtherVal END OtherVal,
    minDt
FROM CTE t1 
LEFT JOIN T t2 ON t1.minDt = t2.Date
ORDER BY t1.minDT;

| ID  | Name     | SomeVal        | OtherVal       | minDt      |
| --- | -------- | -------------- | -------------- | ---------- |
| 10  | Somebody | 33001.9296875  | 33001.9296875  | 2018-10-01 |
| 10  | Somebody | 33481.9296875  | 33481.9296875  | 2018-10-02 |
| 10  | Somebody | 33001.9296875  | 33001.9296875  | 2018-10-03 |
| 10  | Somebody | 33582.76171875 | 33582.76171875 | 2018-10-04 |
| 10  | Somebody | 33582.73046875 | 33582.7890625  | 2018-10-05 |
| 10  | Somebody | 33582.73046875 | 33582.7890625  | 2018-10-06 |
| 10  | Somebody | 33582.73046875 | 33582.7890625  | 2018-10-07 |
| 10  | Somebody | 33582.76171875 | 33582.76171875 | 2018-10-08 |
| 10  | Somebody | 33582.76171875 | 33582.76171875 | 2018-10-09 |
| 10  | Somebody | 33462.76171875 | 33462.76171875 | 2018-10-10 |

View on DB Fiddle


如果您的mysql版本不支持cte,则可以为outer join创建一个日历表

模式(MySQL v5.7)

CREATE TABLE T(
   ID int,
   Name varchar(50),
   SomeVal float,   
   OtherVal float,   
   `Date` date
);




insert into T values (10,'Somebody',33001.93,33001.93,'2018-10-01');
insert into T values (10,'Somebody',33481.93,33481.93,'2018-10-02');
insert into T values (10,'Somebody',33001.93,33001.93,'2018-10-03');
insert into T values (10,'Somebody',33582.76,33582.76,'2018-10-04');
insert into T values (10,'Somebody',33582.73,33582.79,'2018-10-05');
insert into T values (10,'Somebody',33582.76,33582.76,'2018-10-08');
insert into T values (10,'Somebody',33462.76,33462.76,'2018-10-10');


CREATE Table calendar(
   minDt Date
);

INSERT INTO calendar values ('2018-10-01');
INSERT INTO calendar values ('2018-10-02');
INSERT INTO calendar values ('2018-10-03');
INSERT INTO calendar values ('2018-10-04');
INSERT INTO calendar values ('2018-10-05');
INSERT INTO calendar values ('2018-10-06');
INSERT INTO calendar values ('2018-10-07');
INSERT INTO calendar values ('2018-10-08');
INSERT INTO calendar values ('2018-10-09');
INSERT INTO calendar values ('2018-10-10');

查询#1

SELECT  
    CASE WHEN ID IS NULL THEN (SELECT ID 
                            FROM T tt 
                            WHERE tt.Date < t1.minDt
                            ORDER BY tt.Date DESC
                            LIMIT 1)  
    ELSE ID END ID,
    CASE WHEN Name IS NULL THEN (SELECT Name 
                            FROM T tt 
                            WHERE tt.Date < t1.minDt
                            ORDER BY tt.Date DESC
                            LIMIT 1) 
    ELSE Name END Name,
    CASE WHEN SomeVal IS NULL THEN (SELECT SomeVal 
                            FROM T tt 
                            WHERE tt.Date < t1.minDt
                            ORDER BY tt.Date DESC
                            LIMIT 1) 
    ELSE SomeVal END SomeVal,
    CASE WHEN OtherVal IS NULL THEN (SELECT OtherVal 
                            FROM T tt 
                            WHERE tt.Date < t1.minDt
                            ORDER BY tt.Date DESC
                            LIMIT 1) 
    ELSE OtherVal END OtherVal,
    minDt
FROM calendar t1 
LEFT JOIN T t2 ON t1.minDt = t2.Date
ORDER BY t1.minDT;

| ID  | Name     | SomeVal        | OtherVal       | minDt      |
| --- | -------- | -------------- | -------------- | ---------- |
| 10  | Somebody | 33001.9296875  | 33001.9296875  | 2018-10-01 |
| 10  | Somebody | 33481.9296875  | 33481.9296875  | 2018-10-02 |
| 10  | Somebody | 33001.9296875  | 33001.9296875  | 2018-10-03 |
| 10  | Somebody | 33582.76171875 | 33582.76171875 | 2018-10-04 |
| 10  | Somebody | 33582.73046875 | 33582.7890625  | 2018-10-05 |
| 10  | Somebody | 33582.73046875 | 33582.7890625  | 2018-10-06 |
| 10  | Somebody | 33582.73046875 | 33582.7890625  | 2018-10-07 |
| 10  | Somebody | 33582.76171875 | 33582.76171875 | 2018-10-08 |
| 10  | Somebody | 33582.76171875 | 33582.76171875 | 2018-10-09 |
| 10  | Somebody | 33462.76171875 | 33462.76171875 | 2018-10-10 |

View on DB Fiddle