正确使用PERCENTILE_CONT-Oracle SQL

时间:2019-07-01 12:41:03

标签: sql oracle analytics

我正在尝试计算以下内容:

  • 数据集的平均值
  • 数据集的中位数
  • 数据集的前20%
  • 数据集底部20%

我的数据集如下:

| Part | Step | Step_Start | Part_Finish |   TheTime   |
|:----:|:----:|:----------:|:-----------:|:-----------:|
|   1  |  200 |  15-Aug-18 |  19-Jun-19  | 307.4926273 |
|   2  |  200 |  7-Jun-19  |  19-Jun-19  |  11.4434375 |
|   3  |  200 |  17-Sep-18 |   4-Feb-19  | 139.4360417 |
|   4  |  200 |  30-Jan-19 |   4-Feb-19  | 4.356666667 |
|   5  |  200 |  1-Oct-18  |  18-Feb-19  | 139.4528009 |
|   6  |  200 |  13-Feb-19 |  18-Feb-19  |   4.50375   |
|   7  |  200 |  17-Oct-18 |  28-Mar-19  | 161.7007176 |
|   8  |  200 |  12-Nov-18 |  28-Mar-19  |  135.630625 |
|   9  |  200 |  25-Oct-18 |  26-Feb-19  | 123.6026968 |
|  10  |  200 |  22-Feb-19 |  26-Feb-19  | 3.628090278 |
|  11  |  200 |  30-Oct-18 |   3-Jan-19  | 64.51466435 |
|  12  |  200 |  12-Dec-18 |   3-Jan-19  | 21.48703704 |
|  13  |  200 |  15-Nov-18 |  14-Jan-19  | 59.41373843 |
|  14  |  200 |  7-Jan-19  |  14-Jan-19  | 6.621828704 |
|  15  |  200 |  15-Nov-18 |  12-Jan-19  | 57.62283565 |
|  16  |  200 |  8-Jan-19  |  12-Jan-19  | 3.264398148 |
|  17  |  200 |  15-Nov-18 |   7-Mar-19  | 111.5082523 |
|  18  |  200 |  4-Mar-19  |   7-Mar-19  | 2.153587963 |
|  19  |  200 |  16-Nov-18 |  23-May-19  | 187.6931481 |
|  20  |  200 |  16-Nov-18 |   3-Jan-19  | 47.47916667 |
|  21  |  200 |  17-Dec-18 |   3-Jan-19  | 16.62722222 |
|  22  |  200 |  20-Nov-18 |  14-Feb-19  |  85.6115625 |
|  23  |  200 |  9-Feb-19  |  14-Feb-19  | 4.520787037 |
|  24  |  200 |  19-Nov-18 |  14-Jan-19  | 55.53342593 |
|  25  |  200 |  9-Jan-19  |  14-Jan-19  | 4.721400463 |
|  26  |  200 |  26-Nov-18 |   9-Jan-19  | 43.50748843 |
|  27  |  200 |  4-Jan-19  |   9-Jan-19  | 4.417164352 |
|  28  |  200 |  26-Nov-18 |  21-Jan-19  | 55.59988426 |
|  29  |  200 |  13-Jan-19 |  21-Jan-19  |    7.535    |
|  30  |  200 |  16-Jan-19 |  21-Jan-19  | 4.618796296 |
|  31  |  200 |  26-Nov-18 |  11-Jan-19  | 45.42148148 |
|  32  |  200 |  4-Jan-19  |  11-Jan-19  | 6.316921296 |
|  33  |  200 |  4-Dec-18  |  24-Jan-19  |  50.3669213 |
|  34  |  200 |  18-Jan-19 |  24-Jan-19  | 5.589467593 |
|  35  |  200 |  4-Dec-18  |  31-Jan-19  | 57.26877315 |
|  36  |  200 |  22-Jan-19 |  31-Jan-19  | 8.240034722 |
|  37  |  200 |  5-Dec-18  |  28-Jun-19  | 204.5283912 |
|  38  |  200 |  26-Jun-19 |  28-Jun-19  | 1.508252315 |
|  39  |  200 |  9-Feb-19  |  19-Feb-19  | 9.532893519 |
|  40  |  200 |  7-Dec-18  |  14-Feb-19  | 68.51900463 |
|  41  |  200 |  5-Feb-19  |  14-Feb-19  | 8.641076389 |
|  42  |  200 |  11-Dec-18 |  25-Jan-19  | 44.50501157 |
|  43  |  200 |  22-Jan-19 |  25-Jan-19  | 2.511435185 |
|  44  |  200 |  13-Dec-18 |  17-Jan-19  | 34.43806713 |
|  45  |  200 |  14-Jan-19 |  17-Jan-19  | 2.210972222 |
|  46  |  200 |  13-Dec-18 |  24-Jan-19  | 41.38921296 |
|  47  |  200 |  17-Jan-19 |  24-Jan-19  | 6.444664352 |
|  48  |  200 |  10-Jan-19 |   7-Feb-19  | 27.43130787 |
|  49  |  200 |  1-Feb-19  |   7-Feb-19  | 5.349189815 |
|  50  |  200 |  18-Dec-18 |   4-Feb-19  | 47.50416667 |
|  51  |  200 |  29-Jan-19 |   4-Feb-19  | 5.481979167 |
|  52  |  200 |  3-Jan-19  |  30-Jan-19  | 26.46112269 |
|  53  |  200 |  23-Jan-19 |  30-Jan-19  | 6.712175926 |
|  54  |  200 |  4-Jan-19  |   5-Feb-19  | 31.49590278 |
|  55  |  200 |  30-Jan-19 |   5-Feb-19  | 5.385798611 |
|  56  |  200 |  23-Jan-19 |  20-Mar-19  |  55.296875  |
|  57  |  200 |  21-Feb-19 |  20-Mar-19  | 26.06854167 |
|  58  |  200 |  22-Jan-19 |  14-Mar-19  | 50.57989583 |
|  59  |  200 |  8-Mar-19  |  14-Mar-19  | 5.147303241 |
|  60  |  200 |  22-Jan-19 |  21-Feb-19  | 29.46405093 |
|  61  |  200 |  14-Feb-19 |  21-Feb-19  | 6.701724537 |
|  62  |  200 |  24-Jan-19 |  23-Apr-19  | 88.50689815 |
|  63  |  200 |  17-Apr-19 |  23-Apr-19  | 5.725405093 |
|  64  |  200 |  28-Jan-19 |  21-Feb-19  | 23.50082176 |
|  65  |  200 |  13-Feb-19 |  21-Feb-19  | 7.115717593 |
|  66  |  200 |  31-Jan-19 |  28-Feb-19  | 27.55881944 |
|  67  |  200 |  25-Feb-19 |  28-Feb-19  | 2.633738426 |
|  68  |  200 |  31-Jan-19 |  27-Feb-19  | 26.46105324 |
|  69  |  200 |  23-Feb-19 |  27-Feb-19  | 3.531423611 |
|  70  |  200 |  1-Feb-19  |  28-Feb-19  | 26.45835648 |
|  71  |  200 |  27-Feb-19 |  28-Feb-19  | 0.471296296 |
|  72  |  200 |  6-Feb-19  |  27-Feb-19  | 20.54436343 |
|  73  |  200 |  23-Feb-19 |  27-Feb-19  | 3.598854167 |
|  74  |  200 |  6-Feb-19  |   5-Mar-19  | 26.54347222 |
|  75  |  200 |  28-Feb-19 |   5-Mar-19  | 4.303773148 |
|  76  |  200 |  12-Feb-19 |   6-Mar-19  | 21.56993056 |
|  77  |  200 |  1-Mar-19  |   6-Mar-19  | 4.597615741 |
|  78  |  200 |  12-Feb-19 |  14-Mar-19  | 29.50417824 |
|  79  |  200 |  7-Mar-19  |  14-Mar-19  | 6.083541667 |
|  80  |  200 |  28-Feb-19 |  28-Mar-19  |  27.5291088 |
|  81  |  200 |  25-Mar-19 |  28-Mar-19  | 2.637824074 |
|  82  |  200 |  29-Jan-19 |  28-Feb-19  | 29.34280093 |
|  83  |  200 |  21-Feb-19 |  28-Feb-19  | 6.233831019 |
|  84  |  200 |  19-Feb-19 |  30-Apr-19  | 69.51832176 |
|  85  |  200 |  7-Feb-19  |   5-Mar-19  | 25.74865741 |
|  86  |  200 |  27-Feb-19 |   5-Mar-19  | 5.380034722 |
|  87  |  200 |  21-Feb-19 |  21-Mar-19  | 27.56310185 |
|  88  |  200 |  19-Mar-19 |  21-Mar-19  | 1.161828704 |
|  89  |  200 |  26-Feb-19 |  28-Mar-19  | 29.41315972 |
|  90  |  200 |  22-Mar-19 |  28-Mar-19  | 5.673703704 |
|  91  |  200 |  26-Feb-19 |  28-Mar-19  |  29.5131713 |
|  92  |  200 |  20-Mar-19 |  28-Mar-19  | 7.073414352 |
|  93  |  200 |  28-Feb-19 |  15-Apr-19  | 45.63513889 |
|  94  |  200 |  5-Apr-19  |  15-Apr-19  | 9.479456019 |
|  95  |  200 |  1-Mar-19  |  29-Mar-19  | 27.54568287 |
|  96  |  200 |  25-Mar-19 |  29-Mar-19  | 3.044340278 |
|  97  |  200 |  4-Mar-19  |  27-Mar-19  | 22.52392361 |
|  98  |  200 |  21-Mar-19 |  27-Mar-19  | 5.074421296 |
|  99  |  200 |  14-Feb-19 |  19-Mar-19  | 32.54349537 |
|  100 |  200 |  13-Mar-19 |  19-Mar-19  | 5.265266204 |

我当前的SQL查询如下:

SELECT
    Step,
    ROUND(MEDIAN(Part_Finish - Step_Start), 2) AS "The_Median",
    ROUND(AVG(Part_Finish - Step_Start), 2) AS "The_Average",
    PERCENTILE_CONT(0.20) WITHIN GROUP (ORDER BY (Part_Finish - Step_Start) ASC) AS "Best_Time",
    PERCENTILE_CONT(0.80) WITHIN GROUP (ORDER BY (Part_Finish - Step_Start) ASC) AS "Worst_Time"

FROM
    myTbl

GROUP BY
    Step

但是,我不确定我的结果是否正确,因为我认为我没有正确使用PERCENTILE_CONT()。如何根据最佳的20%的数据和最差的20%的数据,使用PERCENTILE_CONT()(或另一种方法)找到平均值或中位数(以较容易为准)的“完成时间”? / p>

我希望一些结果看起来像这样:

| Step | The_Average | The_Median | Best_Time | Worst_Time |
|:----:|:-----------:|:----------:|:---------:|:----------:|
|  200 |  < value >  |  < value > | < value > |  < value > |

其中< value >字段是正确计算的平均值,中位数以及数据集的最佳和最差值。最佳和最差是通过找到数据前20%(即最小的时间)或数据中最差的20%(即最大的时间)的平均值或中位数来计算的。

1 个答案:

答案 0 :(得分:1)

PERCENTILE_CONT是一个窗口函数,因此,如果您只想要一个由具有标量值的单个记录组成的结果集,则可以尝试选择不同的内容:

SELECT DISTINCT
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY Part_Finish - Step_Start) AS "The_median",
    ROUND(AVG(Part_Finish - Step_Start) OVER (ORDER BY Part_Finish - Step_Start), 2) AS "The_Average",
    PERCENTILE_CONT(0.20) WITHIN GROUP (ORDER BY Part_Finish - Step_Start) AS "Best_Time",
    PERCENTILE_CONT(0.80) WITHIN GROUP (ORDER BY Part_Finish - Step_Start) AS "Worst_Time"
FROM myTbl;

采用上述方法的原因是,在整个表上选择窗口功能PERCENTILE_CONT只会返回整个表作为结果集。但是,当您使用它时,每个记录的值将始终相同。因此,我们可以采用不同的值来获得单个结果。

如果您期望每个Step的值都有不同的报告,那么您应该在对PARTITION BY的调用中使用PERCENTILE_CONT,例如

PERCENTILE_CONT(0.5) WITHIN GROUP (PARTITION BY Step
    ORDER BY Part_Finish - Step_Start) AS "The_median"