我有一个包含字段(id,letter,date)
的表格,其中包含一些数据:
1 A 2012-01-01
2 B NULL
3 C NULL
4 D 2012-01-15
我想用最接近的非NULL值的平均日期填充NULL值。像那样:
1 A 2012-01-01
2 B 2012-01-08
3 C 2012-01-08
4 D 2012-01-15
或许,或许,即便如此:
1 A 2012-01-01
2 B 2012-01-08
3 C 2012-01-11
4 D 2012-01-15
两种变体都很棒。有没有一种简单的方法在MySQL中实现它?
提前致谢
UPD表非常大,大约有700,000条记录,与所描述的大约有50,000个差距。
UPD2有点清洁:表可能是这样的:
1 A 2012-01-01
2 B NULL
3 C NULL
4 D 2012-01-15
5 E NULL
6 F 2012-01-17
7 G NULL
8 H NULL
9 I 2012-01-20
预期结果如下:
1 A 2012-01-01
2 B **2012-01-08**
3 C **2012-01-08**
4 D 2012-01-15
5 E **2012-01-16**
6 F 2012-01-17
7 G **2012-01-18**
8 H **2012-01-18**
9 I 2012-01-20
(星号是注意更改的值)。感谢
UPD3感谢每一个人。但我会以另一种方式做到这一点,用一个简单的公式计算日期:needed_date = [(max(date)-min(date))/(max(id)-min(id)] *(my_ID-min(id) ))+ min(日期)
答案 0 :(得分:1)
假设您有一个名为T
的表格,如下所示:
CREATE TABLE T(
id INT,
time DATETIME
);
以下查询将为您提供每个NULL记录的边界:
SELECT T.Id
, MAX(T1.Time) as MinDate
, MIN(T2.Time) as MaxDate
FROM T
INNER JOIN T T1 ON T1.Id < T.Id
AND T.time IS NULL
AND NOT T1.time IS NULL
INNER JOIN T T2 ON T2.id > T.id
AND T.time IS NULL
AND NOT T2.time IS NULL
GROUP BY Id
输出结果为:
Id MinDate MaxDate
2 2012-01-01 2012-01-15
3 2012-01-01 2012-01-15
因此,下一步是使用此结果集中的值进行更新,以便以平均值更新NULL ..
UPDATE T
INNER JOIN
(
SELECT T.Id, MAX(T1.Time) as MinTime, MIN(T2.Time) as MaxTime
FROM T
INNER JOIN T T1 ON T1.id < T.id
AND T.time IS NULL
AND NOT T1.time IS NULL
INNER JOIN T T2 ON T2.id > T.id
AND T.time IS NULL
AND NOT T2.time IS NULL
GROUP BY T.ID) T3
ON T3.id = T.id
SET T.time = FROM_UNIXTIME((UNIX_TIMESTAMP(T3.MinTime) + UNIX_TIMESTAMP(T3.MaxTime)) / 2)
WHERE T.time IS NULL
<强> Working SQLFiddle Here 强>
答案 1 :(得分:1)
SELECT id,letter,IFNULL(date,dt) date FROM mytable,
(SELECT DATE(mindate + INTERVAL (secdiff/2) SECOND) dt
FROM (SELECT mindate,UNIX_TIMESTAMP(maxdate)
- UNIX_TIMESTAMP(mindate) secdiff
FROM (SELECT MIN(date) mindate FROM mytable) N,
(SELECT MAX(date) maxdate FROM mytable) X) AA) A;
mysql> DROP TABLE IF EXISTS mytable;
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE TABLE mytable
-> (
-> id int not null auto_increment,
-> letter char(1),
-> `date` date,
-> primary key (id)
-> );
Query OK, 0 rows affected (0.07 sec)
mysql> INSERT INTO mytable (letter,date) VALUES
-> ('A','2012-01-01'),('B',NULL),('C',NULL),('D','2012-01-15');
Query OK, 4 rows affected (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> SELECT * FROM mytable;
+----+--------+------------+
| id | letter | date |
+----+--------+------------+
| 1 | A | 2012-01-01 |
| 2 | B | NULL |
| 3 | C | NULL |
| 4 | D | 2012-01-15 |
+----+--------+------------+
4 rows in set (0.00 sec)
mysql>
mysql> SELECT id,letter,IFNULL(date,dt) date FROM mytable,
-> (SELECT DATE(mindate + INTERVAL (secdiff/2) SECOND) dt
-> FROM (SELECT mindate,UNIX_TIMESTAMP(maxdate)
-> - UNIX_TIMESTAMP(mindate) secdiff
-> FROM (SELECT MIN(date) mindate FROM mytable) N,
-> (SELECT MAX(date) maxdate FROM mytable) X) AA) A;
+----+--------+------------+
| id | letter | date |
+----+--------+------------+
| 1 | A | 2012-01-01 |
| 2 | B | 2012-01-08 |
| 3 | C | 2012-01-08 |
| 4 | D | 2012-01-15 |
+----+--------+------------+
4 rows in set (0.00 sec)
mysql>
此查询使用UNIX时间戳的平均值。如果所有日期都为NULL,则使用今天的日期:
SELECT id,letter,IFNULL(date,dt) date FROM mytable,
(
SELECT IF(K=0,DATE(NOW()),avgdt) dt FROM
(SELECT DATE(FROM_UNIXTIME(AVG(UNIX_TIMESTAMP(date))))
avgdt FROM mytable) AA,
(SELECT COUNT(date) K FROM mytable) BB
) A;
mysql> SELECT id,letter,IFNULL(date,dt) date FROM mytable,
-> (
-> SELECT IF(K=0,DATE(NOW()),avgdt) dt FROM
-> (SELECT DATE(FROM_UNIXTIME(AVG(UNIX_TIMESTAMP(date))))
-> avgdt FROM mytable) AA,
-> (SELECT COUNT(date) K FROM mytable) BB
-> ) A;
+----+--------+------------+
| id | letter | date |
+----+--------+------------+
| 1 | A | 2012-01-01 |
| 2 | B | 2012-01-08 |
| 3 | C | 2012-01-08 |
| 4 | D | 2012-01-15 |
+----+--------+------------+
4 rows in set (0.05 sec)
mysql>