Hive时间窗函数的错误

时间:2016-12-19 06:43:41

标签: sql hive

我有一个名为gmv_active_mem_monthly的表。这里可以看到整行:

month   gmv_monthly active_member_monthly
201612  231657626042    2602064
201611  373576915733    3498039
201610  367824193757    3648708
201609  356167649082    3686007
201608  383362147243    3998595
201607  383828659139    3917252
201606  332929299345    3627298
201605  323084120955    3579938
201604  280834688208    3293682
201603  282180201106    3316420
201602  246386923468    3097107
201601  261355415707    3186347
201512  273860930491    3071105
201511  246606316046    2981534
201510  237766306308    2873558
201509  160390583711    2267418
201508  124370765573    2002018
201507  110236706032    1855539
201506  84844225170 1467889
201505  60651906632 1180800
201504  46808796126 917681
201503  12498656329 427529
201502  4918371362  190932
201501  2824293727  129203

我在hive中运行一个简单的代码:

select  month,
        sum(gmv_monthly) over
        (
            order by  "month"
            rows      between 12 preceding and 1 preceding
        ) as total_gmv,
        sum(active_member_monthly) over
        (
            order by  "month"
            rows      between 12 preceding and 1 preceding
        ) as total_active_mem

from    novaya.gmv_active_mem_monthly 
;

但结果是完全错误的,而我在另一个数据集上使用相同的代码是正确的。 上面数据集的结果是:

month   total_gmv   total_active_mem
201501  NULL    NULL
201502  2824293727  129203
201503  7742665089  320135
201504  20241321418 747664
201505  67050117544 1665345
201506  127702024176    2846145
201507  212546249346    4314034
201508  322782955378    6169573
201509  447153720951    8171591
201510  607544304662    10439009
201511  845310610970    13312567
201512  1091916927016   16294101
201601  1365777857507   19365206
201602  1624308979487   22422350
201603  1865777531593   25328525
201604  2135459076370   28217416
201605  2369484968452   30593417
201606  2631917182775   32992555
201607  2880002256950   35151964
201608  3153594210057   37213677
201609  3412585591727   39210254
201610  3608362657098   40628843
201611  3738420544547   41403993
201612  3865391144234   41920498

我们可以检查201602的1624308979487减去201601的1365777857507是否等于gmv_active_mem_monthly的201601的值。 那么代码有什么问题呢?代码在另一个数据集上运行完美,没有像这样的错误。

1 个答案:

答案 0 :(得分:1)

没有问题。结果是正确的 差异不是1个月,而是 2 个月,是范围的每个边缘之一。

201502  4,918,371,362          <-- This value goes only with 201601  
201503  12,498,656,329         
201504  46,808,796,126 
201505  60,651,906,632 
201506  84,844,225,170 
201507  110,236,706,032 
201508  124,370,765,573 
201509  160,390,583,711 
201510  237,766,306,308 
201511  246,606,316,046 
201512  273,860,930,491 
201601  261,355,415,707        <-- This value goes only with 201602
201602