我正在使用一个包含工作日数据的表。该数据几乎与一天中的每日余额有关。数据如下:
ID Name Some Val Other Val Date
10 Somebody 33001.93 33001.93 2018-10-01
10 Somebody 33481.93 33481.93 2018-10-02
10 Somebody 33001.93 33001.93 2018-10-03
10 Somebody 33582.76 33582.76 2018-10-04
10 Somebody 33582.73 33582.79 2018-10-05
------- Missing row for 2018-10-06 ---------------
------- Missing row for 2018-10-07 ---------------
10 Somebody 33582.76 33582.76 2018-10-08
------- Missing row for 2018-10-09 ---------------
10 Somebody 33462.76 33462.76 2018-10-10
我的任务是计算平均每日余额(每天结束时的总余额/总天数)。为了进行计算,我需要确保我整天都有数据。为此,最后一行需要替换丢失的数据。
我的需要是
ID Name Some Val Other Val Date
10 Somebody 33001.93 33001.93 2018-10-01
10 Somebody 33481.93 33481.93 2018-10-02
10 Somebody 33001.93 33001.93 2018-10-03
10 Somebody 33582.76 33582.76 2018-10-04
10 Somebody 33582.73 33582.79 2018-10-05
10 Somebody 33582.73 33582.79 2018-10-06
10 Somebody 33582.73 33582.79 2018-10-07
10 Somebody 33582.76 33582.76 2018-10-08
10 Somebody 33382.76 33582.76 2018-10-09
10 Somebody 33462.76 33462.76 2018-10-10
本质上,第5行写入丢失的第6和7行,第8行写入第9行。
我通过创建日历表然后使用以下查询来解决该问题:
SELECT
CASE WHEN ID IS NULL THEN (SELECT ID
FROM T tt
WHERE tt.Date < t1.minDt
ORDER BY tt.Date DESC
LIMIT 1)
ELSE ID END ID,
CASE WHEN Name IS NULL THEN (SELECT Name
FROM T tt
WHERE tt.Date < t1.minDt
ORDER BY tt.Date DESC
LIMIT 1)
ELSE Name END Name,
CASE WHEN SomeVal IS NULL THEN (SELECT SomeVal
FROM T tt
WHERE tt.Date < t1.minDt
ORDER BY tt.Date DESC
LIMIT 1)
ELSE SomeVal END SomeVal,
CASE WHEN OtherVal IS NULL THEN (SELECT OtherVal
FROM T tt
WHERE tt.Date < t1.minDt
ORDER BY tt.Date DESC
LIMIT 1)
ELSE OtherVal END OtherVal,
minDt
FROM calendar t1
LEFT JOIN T t2 ON t1.minDt = t2.Date
ORDER BY t1.minDT;
当ID值恒定时,此解决方案有效。我意识到我的数据集有成千上万条具有数百个唯一ID值的记录。每个ID可能缺少值。上面的查询仅替换数据的顶部,而不替换整个数据。我需要为每个ID运行相同的查询。我猜按分区可以在mysql中工作,但是我不太确定如何尝试。
数据实际上看起来像这样:
10,'Somebody',33001.93,33001.93,'2018-10-01'
10,'Somebody',33481.93,33481.93,'2018-10-02'
10,'Somebody',33001.93,33001.93,'2018-10-03'
10,'Somebody',33582.76,33582.76,'2018-10-04'
10,'Somebody',33582.73,33582.79,'2018-10-05'
10,'Somebody',33582.76,33582.76,'2018-10-08'
15,'someone else',33462.76,33462.76,'2018-10-1'
15,'someone else',33582.76,33582.76,'2018-10-04'
15,'someone else',33582.73,33582.79,'2018-10-05'
15,'someone else',33582.76,33582.76,'2018-10-08'
15,'someone else',33462.76,33462.76,'2018-10-10'
您可以在此处尝试使用虚拟数据和上述查询:
我正在使用的MySQL版本是:
mysql Ver 14.14 Distrib 5.7.24, for Linux (x86_64) using EditLine wrapper
答案 0 :(得分:1)
您可以使用MySQL变量填写表数据。诀窍是将日历表JOIN
移到表中不同的ID
值列表中,以获取具有该范围内每个日期的ID和日期的表。然后可以将其LEFT JOIN
放入数据表以获取它们存在的值,并且可以使用MySQL变量来填补空白:
SELECT thedate,
@name := coalesce(Name, @name) AS Name,
@someval := coalesce(SomeVal, @someval) AS SomeVal,
@otherval := coalesce(OtherVal, @otherval) AS OtherVal,
@id := id AS id
FROM (SELECT c.thedate, i.id, t.Name, t.SomeVal, t.OtherVal
FROM calendar c
JOIN (SELECT DISTINCT id FROM t) i
LEFT JOIN t ON t.date = c.thedate AND t.id = i.id) g
CROSS JOIN (SELECT @id := 0, @name := '', @someval := 0, @otherval := 0) v
ORDER BY id, thedate
输出示例数据:
thedate Name SomeVal OtherVal id
2018-10-01 Somebody 33001.93 33001.93 10
2018-10-02 Somebody 33481.93 33481.93 10
2018-10-03 Somebody 33001.93 33001.93 10
2018-10-04 Somebody 33582.76 33582.76 10
2018-10-05 Somebody 33582.73 33582.79 10
2018-10-06 Somebody 33582.73 33582.79 10
2018-10-07 Somebody 33582.73 33582.79 10
2018-10-08 Somebody 33582.76 33582.76 10
2018-10-09 Somebody 33582.76 33582.76 10
2018-10-10 Somebody 33582.76 33582.76 10
2018-10-01 someone else 33462.76 33462.76 15
2018-10-02 someone else 33462.76 33462.76 15
2018-10-03 someone else 33462.76 33462.76 15
2018-10-04 someone else 33582.76 33582.76 15
2018-10-05 someone else 33582.73 33582.79 15
2018-10-06 someone else 33582.73 33582.79 15
2018-10-07 someone else 33582.73 33582.79 15
2018-10-08 someone else 33582.76 33582.76 15
2018-10-09 someone else 33582.76 33582.76 15
2018-10-10 someone else 33462.76 33462.76 15
我在dbfiddle上创建了一个演示,演示了各个部分如何组合在一起(包括我的日历表,该日历表仅包含表中的日期)。
答案 1 :(得分:0)
我想我通过使用与上述相同的逻辑取得了一些进展。必须使用id数据创建日历查找表。我在日期和ID级别进行匹配。结果表获得了很多重复/空记录,但是对数据进行去往操作几乎可以满足我的需求。
这肯定不是最优雅的解决方案,因为我使用的临时数据集非常大。必须有一个更简洁的解决方案,但目前对我有用。