我想从我的数据集中填充缺少的空值。我有这样的数据集
+---------------------+------+-------------+
| ORDER_DATE | SHOP | SALESPERSON |
+---------------------+------+-------------+
| 14/04/2017 04:44:27 | A | MIKE |
+---------------------+------+-------------+
| 14/04/2017 04:44:55 | A | |
+---------------------+------+-------------+
| 14/04/2017 04:45:07 | A | TIM |
+---------------------+------+-------------+
| 14/04/2017 04:45:30 | A | |
+---------------------+------+-------------+
| 14/04/2017 04:45:43 | B | |
+---------------------+------+-------------+
| 14/04/2017 04:46:13 | B | JOHN |
+---------------------+------+-------------+
| 14/04/2017 04:46:28 | B | |
+---------------------+------+-------------+
| 14/04/2017 04:58:32 | C | |
+---------------------+------+-------------+
| 14/04/2017 04:58:41 | C | MELINDA |
+---------------------+------+-------------+
我希望使用商店内空值之前的第一个找到的值来填充商店分区的销售人员信息。我试过这个,但这不会产生正确的结果(如下)。怎么解决这个问题?
CREATE TABLE SALES (
ORDER_DATE DATE,
SHOP VARCHAR2(30 CHAR),
SALESPERSON VARCHAR2(30 CHAR)
)
;
REM INSERTING INTO SALES
SET DEFINE OFF;
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:44:27','DD/MM/YYYY HH24:MI:SS'),'A','MIKE');
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:44:55','DD/MM/YYYY HH24:MI:SS'),'A',NULL);
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:45:07','DD/MM/YYYY HH24:MI:SS'),'A','TIM');
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:45:30','DD/MM/YYYY HH24:MI:SS'),'A',NULL);
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:45:43','DD/MM/YYYY HH24:MI:SS'),'B',NULL);
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:46:13','DD/MM/YYYY HH24:MI:SS'),'B','JOHN');
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:46:28','DD/MM/YYYY HH24:MI:SS'),'B',NULL);
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:58:32','DD/MM/YYYY HH24:MI:SS'),'C',NULL);
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:58:41','DD/MM/YYYY HH24:MI:SS'),'C','MELINDA');
COMMIT;
SELECT * FROM SALES ORDER BY SHOP, ORDER_DATE;
SELECT ORDER_DATE,
SHOP,
SALESPERSON,
/*tried two approaches*/
/*does not produce a correct result set*/
LAST_VALUE(SALESPERSON) IGNORE NULLS OVER (PARTITION BY SHOP
ORDER BY ORDER_DATE RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS LAST_VALUE_1,
/*this also does not solve this*/
LAST_VALUE(SALESPERSON) IGNORE NULLS OVER(PARTITION BY SHOP
ORDER BY ORDER_DATE ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS LAST_VALUE_2
FROM SALES ;
正确的结果集是:
+---------------------+------+-------------+--------------------+
| ORDER_DATE | SHOP | SALESPERSON | SALESPERSON_FILLED |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:44:27 | A | MIKE | MIKE |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:44:55 | A | | MIKE |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:45:07 | A | TIM | TIM |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:45:30 | A | | TIM |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:45:43 | B | | |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:46:13 | B | JOHN | JOHN |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:46:28 | B | | JOHN |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:58:32 | C | | |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:58:41 | C | MELINDA | MELINDA |
+---------------------+------+-------------+--------------------+
答案 0 :(得分:3)
你很亲密。
试试这个:
SELECT ORDER_DATE,
SHOP,
SALESPERSON,
LAST_VALUE(SALESPERSON) IGNORE NULLS OVER
(PARTITION BY SHOP ORDER BY ORDER_DATE ) AS LAST_VALUE_1
FROM SALES
order by shop, order_date;
ORDER_DA SHOP SALESPERSON LAST_VALUE_1
-------- ------------------------------ ------------------------------ ------------------------------
17/04/14 A MIKE MIKE
17/04/14 A MIKE
17/04/14 A TIM TIM
17/04/14 A TIM
17/04/14 B
17/04/14 B JOHN JOHN
17/04/14 B JOHN
17/04/14 C
17/04/14 C MELINDA MELINDA
9 rows selected.
答案 1 :(得分:0)
以下查询适用于SQL Server。我没有看到为什么它不适用于Oracle的原因:
SELECT ORDER_DATE, SHOP,
MAX(SALESPERSON) OVER (PARTITION BY SHOP, grp) AS SALESPERSON
FROM (
SELECT ORDER_DATE, SHOP, SALESPERSON,
SUM(CASE WHEN SALESPERSON IS NOT NULL THEN 1 END)
OVER
(PARTITION BY SHOP ORDER BY ORDER_DATE) AS grp
FROM mytable) AS t
ORDER BY ORDER_DATE
这是内部查询产生的内容:
ORDER_DATE SHOP SALESPERSON grp
---------------------------------------------
2017-04-14 04:44:27.000 A MIKE 1
2017-04-14 04:44:55.000 A NULL 1
2017-04-14 04:45:07.000 A TIM 2
2017-04-14 04:45:30.000 A NULL 2
2017-04-14 04:45:43.000 B NULL NULL
2017-04-14 04:46:13.000 B JOHN 1
2017-04-14 04:46:28.000 B NULL 1
2017-04-14 04:58:32.000 C NULL NULL
2017-04-14 04:58:41.000 C MELINDA 1
因此,使用字段grp
和字段SHOP
,我们可以识别应该共享相同SALESPERSON
值的记录“孤岛”。
答案 2 :(得分:0)
在SQL SERVER 2008中, 使用按案例时,按订单:
代码:
SELECT CONVERT(DATETIME,ORDER_DATE) AS ORDER_DATE,
ISNULL(SHOP,'') AS SHOP,
ISNULL(SALESPERSON,'') AS SALESPERSON,
CASE WHEN SALESPERSON IS NULL OR SALESPERSON = '' THEN
ISNULL((SELECT TOP 1 ISNULL(SALESPERSON,'') FROM SALES_stack WHERE ORDER_DATE < S.ORDER_DATE
AND ISNULL(SHOP,'') = ISNULL(S.SHOP,'')
ORDER BY ORDER_DATE DESC),'')
ELSE ISNULL(SALESPERSON,'') END AS SALESPERSON_FILLED FROM SALES_stack S
输出:
ORDER_DATE SHOP SALESPERSON SALESPERSON_FILLED
2017-04-14 04:44:27.000 A MIKE MIKE
2017-04-14 04:44:55.000 A MIKE
2017-04-14 04:45:07.000 A TIM TIM
2017-04-14 04:45:30.000 A TIM
2017-04-14 04:45:43.000 B
2017-04-14 04:46:13.000 B JOHN JOHN
2017-04-14 04:46:28.000 B JOHN
2017-04-14 04:58:32.000 C
2017-04-14 04:58:41.000 C MELINDA MELINDA