GROUP BY TRUNC链(日期)

时间:2018-01-23 15:26:47

标签: sql oracle group-by

我有一个包含每小时数据和值的简单表格。我想计算每个月每日最高值的平均值。 查询看起来很简单:

WITH daily_max AS
(
  SELECT TRUNC(the_date, 'DD') as my_day, MAX(value) AS value 
    FROM my_data 
   GROUP by TRUNC(the_date, 'DD')
)
SELECT trunc(my_day, 'MM'), AVG(value) 
FROM daily_max
GROUP BY trunc(my_day, 'MM')
order by 1
;

然而,我得到了很多"重复"在第一列(每天一个):

01/01/2017 00:00:00 95
01/01/2017 00:00:00 90
01/01/2017 00:00:00 99
01/01/2017 00:00:00 96
01/01/2017 00:00:00 94
01/01/2017 00:00:00 97
01/01/2017 00:00:00 96
01/01/2017 00:00:00 86
01/01/2017 00:00:00 98
01/01/2017 00:00:00 98

01/02/2017 00:00:00 97
01/02/2017 00:00:00 93
01/02/2017 00:00:00 100
01/02/2017 00:00:00 98
01/02/2017 00:00:00 94
01/02/2017 00:00:00 99
01/02/2017 00:00:00 94
01/02/2017 00:00:00 95
01/02/2017 00:00:00 99

第一个子查询按预期返回每日最大值。

我怀疑DATE数据类型有一种奇怪的行为,但即使我在日期使用TO_CHAR函数,我也有相同的行为。 GROUP BY语句中的表达式如何导致具有相同值的多行?

with daily_max AS
(
  SELECT TRUNC(the_date, 'DD') as my_day, MAX(value) AS value 
    FROM my_data 
   GROUP by TRUNC(the_date, 'DD')
)
SELECT TO_CHAR(trunc(my_day, 'MM')), AVG(value) 
FROM daily_max
GROUP BY TO_CHAR(trunc(my_day, 'MM'))
order by 1
;

为了增加我的困惑,当我在第一个子查询中将日期转换为时间戳时,结果就是我所期望的:

with daily_max AS
(
  SELECT CAST(TRUNC(the_date , 'DD') AS timestamp) as my_day, MAX(value) AS value 
    FROM my_data 
   GROUP by TRUNC(the_date , 'DD')
)
SELECT trunc(my_day, 'MM') AS the_month, AVG(value) 
FROM daily_max
GROUP BY trunc(my_day, 'MM')
order by 1
;

01/01/2017 00:00:00 94.9
01/02/2017 00:00:00 95.78571428571428571428571428571428571429
01/03/2017 00:00:00 95.38709677419354838709677419354838709677
01/04/2017 00:00:00 94.9
01/05/2017 00:00:00 95.32258064516129032258064516129032258065
01/06/2017 00:00:00 96.46666666666666666666666666666666666667
01/07/2017 00:00:00 96.16129032258064516129032258064516129032
01/08/2017 00:00:00 96.16129032258064516129032258064516129032
01/09/2017 00:00:00 96.13333333333333333333333333333333333333
01/10/2017 00:00:00 95.87096774193548387096774193548387096774
01/11/2017 00:00:00 97.3
01/12/2017 00:00:00 96.90322580645161290322580645161290322581
01/01/2018 00:00:00 96.43478260869565217391304347826086956522

我可能会想念一些愚蠢的东西,但有人可以向我解释这些行为吗?

查询以生成测试表:

CREATE TABLE my_data 
AS
SELECT TRUNC (SYSDATE - ROWNUM/24, 'HH') as the_date, ROUND(DBMS_RANDOM.value(0,100),0) AS value
  FROM DUAL 
  CONNECT BY ROWNUM < 366*24
  ;

3 个答案:

答案 0 :(得分:1)

这似乎是bug 20537092;它可以在12.1.0.2(使用CTE或内联视图)中重现,但在11.2.0.4或12.2.0.1中可以重现。

该文件中的解决方法似乎解决了这个问题;设置

后运行示例
<link href="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/4.0.0-beta.2/css/bootstrap.css" rel="stylesheet" />
<div class="container">
  <div class="card-deck mb-3 text-center">
    <div class="card mb-4 box-shadow">
      <div class="card-header">
        <h4 class="my-0 font-weight-normal">Free</h4>
      </div>
      <div class="card-body">
        <h1 class="card-title pricing-card-title">$0 <small class="text-muted">/ mo</small></h1>
        <ul class="list-unstyled mt-3 mb-4">
          <li>10 users included</li>
          <li>2 GB of storage</li>
          <li>Email support</li>
          <li>Help center access</li>
          <li>10 users included</li>
          <li>2 GB of storage</li>
          <li>Email support</li>
          <li>Help center access</li>
        </ul>
        <button type="button" class="btn btn-lg btn-block btn-outline-primary">Sign up for free</button>
      </div>
    </div>

    <div class="card mb-4 box-shadow">
      <div class="card-header">
        <h4 class="my-0 font-weight-normal">Enterprise</h4>
      </div>
      <div class="card-body">
        <h1 class="card-title pricing-card-title">$29 <small class="text-muted">/ mo</small></h1>
        <ul class="list-unstyled mt-3 mb-4">
          <li>30 users included</li>
          <li>15 GB of storage</li>
          <li>Phone and email support</li>
          <li>Help center access</li>
        </ul>
        <button type="button" class="btn btn-lg btn-block btn-primary">Contact us</button>
      </div>
    </div>
  </div>

在以前没有的12.1会话中给出了明智的结果:

alter session set "_optimizer_aggr_groupby_elim"=false;

重写查询以避免嵌套的group-by可能更实际 - 取决于您当前的实际情况有多复杂,以及您是否可以修改相关会话或数据库初始化设置,或修补它。

对于您的(可能是简化的)示例,在没有应用变通方法的新会话中,使用distinct和分析版本替换内部聚合/分组似乎有效;它虽然有点难看,但对你的实际情况可能并不实用:

TRUNC(MY_DAY,'MM')  AVG(VALUE)
------------------- ----------
2017-01-01 00:00:00       95.5
2017-02-01 00:00:00 95.6428571
2017-03-01 00:00:00 95.3225806
2017-04-01 00:00:00 95.6666667
2017-05-01 00:00:00 97.0322581
2017-06-01 00:00:00       95.7
2017-07-01 00:00:00 95.0967742
2017-08-01 00:00:00 96.1935484
2017-09-01 00:00:00 94.9333333
2017-10-01 00:00:00         96
2017-11-01 00:00:00 96.9333333
2017-12-01 00:00:00 95.3870968
2018-01-01 00:00:00 95.0434783

和往常一样,只是因为它看起来像这个错误并不意味着它一定是;您可能需要提出服务请求以获得确认,特别是在修补之前。

答案 1 :(得分:-1)

我无法解释你所看到的行为。没有CTE,您可以尝试以不同的方式编写逻辑:

SELECT TRUNC(my_day, 'MM'), 
       SUM(value) / COUNT(DISTINCT TRUNC(the_date, 'DD'))
FROM my_data
GROUP BY TRUNC(my_day, 'MM')
ORDER BY 1;

答案 2 :(得分:-1)

Pehaps trunc()不会返回日期...

WITH daily_max AS
(
  SELECT  to_date(TRUNC(the_date, 'DD')) as my_day, MAX(value)  AS value 
    FROM jfl_test 
    group by  TRUNC(the_date, 'DD')
)
SELECT trunc(my_day, 'MM'), AVG(value) 
FROM daily_max
GROUP BY trunc(my_day, 'MM')
order by 1
;