如何使用分析函数填充缺失值?

时间:2017-04-14 05:30:24

标签: sql oracle oracle11g null oracle12c

我想从我的数据集中填充缺少的空值。我有这样的数据集

+---------------------+------+-------------+
| ORDER_DATE          | SHOP | SALESPERSON |
+---------------------+------+-------------+
| 14/04/2017 04:44:27 | A    | MIKE        |
+---------------------+------+-------------+
| 14/04/2017 04:44:55 | A    |             |
+---------------------+------+-------------+
| 14/04/2017 04:45:07 | A    | TIM         |
+---------------------+------+-------------+
| 14/04/2017 04:45:30 | A    |             |
+---------------------+------+-------------+
| 14/04/2017 04:45:43 | B    |             |
+---------------------+------+-------------+
| 14/04/2017 04:46:13 | B    | JOHN        |
+---------------------+------+-------------+
| 14/04/2017 04:46:28 | B    |             |
+---------------------+------+-------------+
| 14/04/2017 04:58:32 | C    |             |
+---------------------+------+-------------+
| 14/04/2017 04:58:41 | C    | MELINDA     |
+---------------------+------+-------------+

我希望使用商店内空值之前的第一个找到的值来填充商店分区的销售人员信息。我试过这个,但这不会产生正确的结果(如下)。怎么解决这个问题?

CREATE TABLE SALES (
ORDER_DATE DATE, 
SHOP VARCHAR2(30 CHAR), 
SALESPERSON VARCHAR2(30 CHAR)
)
;

REM INSERTING INTO SALES
SET DEFINE OFF;
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:44:27','DD/MM/YYYY HH24:MI:SS'),'A','MIKE');
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:44:55','DD/MM/YYYY HH24:MI:SS'),'A',NULL);
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:45:07','DD/MM/YYYY HH24:MI:SS'),'A','TIM');
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:45:30','DD/MM/YYYY HH24:MI:SS'),'A',NULL);
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:45:43','DD/MM/YYYY HH24:MI:SS'),'B',NULL);
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:46:13','DD/MM/YYYY HH24:MI:SS'),'B','JOHN');
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:46:28','DD/MM/YYYY HH24:MI:SS'),'B',NULL);
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:58:32','DD/MM/YYYY HH24:MI:SS'),'C',NULL);
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:58:41','DD/MM/YYYY HH24:MI:SS'),'C','MELINDA');
COMMIT;

SELECT * FROM SALES ORDER BY SHOP, ORDER_DATE;

SELECT ORDER_DATE,
       SHOP,
       SALESPERSON,
       /*tried two approaches*/
       /*does not produce a correct result set*/
       LAST_VALUE(SALESPERSON) IGNORE NULLS OVER (PARTITION BY SHOP
                   ORDER BY ORDER_DATE RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS LAST_VALUE_1,
       /*this also does not solve this*/            
       LAST_VALUE(SALESPERSON) IGNORE NULLS OVER(PARTITION BY SHOP
                  ORDER BY ORDER_DATE ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS LAST_VALUE_2
FROM SALES ;

正确的结果集是:

+---------------------+------+-------------+--------------------+
| ORDER_DATE          | SHOP | SALESPERSON | SALESPERSON_FILLED |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:44:27 | A    | MIKE        |  MIKE              |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:44:55 | A    |             |  MIKE              |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:45:07 | A    | TIM         |  TIM               |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:45:30 | A    |             |  TIM               |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:45:43 | B    |             |                    |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:46:13 | B    | JOHN        |  JOHN              |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:46:28 | B    |             |  JOHN              |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:58:32 | C    |             |                    |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:58:41 | C    | MELINDA     |  MELINDA           |
+---------------------+------+-------------+--------------------+

3 个答案:

答案 0 :(得分:3)

你很亲密。
试试这个:

SELECT ORDER_DATE,
       SHOP,
       SALESPERSON,

       LAST_VALUE(SALESPERSON) IGNORE NULLS OVER 
            (PARTITION BY SHOP ORDER BY ORDER_DATE ) AS LAST_VALUE_1

FROM SALES
order by shop, order_date;
ORDER_DA SHOP                           SALESPERSON                    LAST_VALUE_1                  
-------- ------------------------------ ------------------------------ ------------------------------
17/04/14 A                              MIKE                           MIKE                          
17/04/14 A                                                             MIKE                          
17/04/14 A                              TIM                            TIM                           
17/04/14 A                                                             TIM                           
17/04/14 B                                                                                           
17/04/14 B                              JOHN                           JOHN                          
17/04/14 B                                                             JOHN                          
17/04/14 C                                                                                           
17/04/14 C                              MELINDA                        MELINDA                       

9 rows selected. 

答案 1 :(得分:0)

以下查询适用于SQL Server。我没有看到为什么它不适用于Oracle的原因:

SELECT ORDER_DATE, SHOP, 
       MAX(SALESPERSON) OVER (PARTITION BY SHOP, grp) AS SALESPERSON
FROM (
   SELECT ORDER_DATE, SHOP, SALESPERSON,
          SUM(CASE WHEN SALESPERSON IS NOT NULL THEN 1 END) 
          OVER
          (PARTITION BY SHOP ORDER BY ORDER_DATE) AS grp
   FROM mytable) AS t
ORDER BY ORDER_DATE

这是内部查询产生的内容:

ORDER_DATE              SHOP SALESPERSON  grp
---------------------------------------------
2017-04-14 04:44:27.000 A    MIKE         1
2017-04-14 04:44:55.000 A    NULL         1
2017-04-14 04:45:07.000 A    TIM          2
2017-04-14 04:45:30.000 A    NULL         2
2017-04-14 04:45:43.000 B    NULL         NULL
2017-04-14 04:46:13.000 B    JOHN         1
2017-04-14 04:46:28.000 B    NULL         1
2017-04-14 04:58:32.000 C    NULL         NULL
2017-04-14 04:58:41.000 C    MELINDA      1

因此,使用字段grp和字段SHOP,我们可以识别应该共享相同SALESPERSON值的记录“孤岛”。

答案 2 :(得分:0)

在SQL SERVER 2008中, 使用按案例时,按订单:

代码:

 SELECT CONVERT(DATETIME,ORDER_DATE) AS ORDER_DATE,
    ISNULL(SHOP,'') AS SHOP,
    ISNULL(SALESPERSON,'') AS SALESPERSON,
    CASE WHEN SALESPERSON IS NULL OR SALESPERSON = '' THEN 
    ISNULL((SELECT TOP 1 ISNULL(SALESPERSON,'')  FROM SALES_stack WHERE ORDER_DATE < S.ORDER_DATE
    AND ISNULL(SHOP,'') = ISNULL(S.SHOP,'') 
    ORDER BY ORDER_DATE DESC),'')
    ELSE  ISNULL(SALESPERSON,'') END AS SALESPERSON_FILLED FROM SALES_stack S

输出:

    ORDER_DATE              SHOP   SALESPERSON  SALESPERSON_FILLED
    2017-04-14 04:44:27.000 A      MIKE         MIKE
    2017-04-14 04:44:55.000 A                   MIKE
    2017-04-14 04:45:07.000 A      TIM          TIM
    2017-04-14 04:45:30.000 A                   TIM
    2017-04-14 04:45:43.000 B       
    2017-04-14 04:46:13.000 B      JOHN         JOHN
    2017-04-14 04:46:28.000 B                   JOHN
    2017-04-14 04:58:32.000 C       
    2017-04-14 04:58:41.000 C      MELINDA      MELINDA