JS UDF,带有来自bigquery标准sql的输入数据分区

时间:2018-04-04 22:16:25

标签: javascript google-bigquery user-defined-functions

我试图将两组现金流(按字段分区" id")传递给js udf IRRCalc并计算每个现金流量集的IRR编号。

  CREATE TEMPORARY FUNCTION IRRCalc(cash_flow ARRAY<FLOAT64>, date_delta ARRAY<INT64>)
    RETURNS FLOAT64
    LANGUAGE js AS """
      min = 0.0;
      max = 100.0;
      iter_cnt = 0;
      do {
        guess = (min + max) / 2;
        NPV = 0.0;
        for (var j=0; j<cash_flow.length; j++){
          NPV += cash_flow[j]/Math.pow((1+guess),date_delta[j]/365);
        }
        if (cash_flow[0] > 0){
          if (NPV > 0){
            max = guess;
          }
          else {
            min = guess;
          }
        }
        if (cash_flow[0] < 0){
          if (NPV > 0){
            min = guess;
          }
          else {
            max = guess;
          }
        }
        iter_cnt = iter_cnt+1;
      } while (Math.abs(NPV) > 0.00000001 && iter_cnt<8192);
      return guess;

    """;
WITH Input AS
 (
  select
    id,
    scenario_date,
    cash_flow_date,
    date_diff(cash_flow_date, min(cash_flow_date) over (partition by id),day) as date_delta,
    sum(cash_flow) as cash_flow
  from cash_flow_table
  where id in ('1','2')
  group by 1,2,3
  order by 1,2,3
 )

 select 
    id, 
    IRRCalc(array(select cash_flow from input), array(select date_delta from input)) as IRR
 from input
 group by 1

输入数据:

Row id  scenario_date   cash_flow_date  date_delta  cash_flow    
1   1   2018-04-02  2016-07-01  0   5979008.899131917    
2   1   2018-04-02  2016-08-03  33  -2609437.0145417987  
3   1   2018-04-02  2016-08-29  59  -21682.04267909576   
4   1   2018-04-02  2016-09-16  77  -4968554.060201097   
5   1   2018-04-02  2018-04-02  640 0.0  
6   2   2018-04-02  2017-09-08  0   -320912.83786916407  
7   2   2018-04-02  2017-09-27  19  3015.2821677139805   
8   2   2018-04-02  2018-03-28  201 3204.6920948425554   
9   2   2018-04-02  2018-04-02  206 440424.3826431843    

理想情况下,我期待输出表格如下:

Row id  IRR  
1   1   3.2
2   2   0.8 

但是,我最终得到的是输出表:

Row id  IRR  
1   1   3.8
2   2   3.8 

我认为问题来自于当我调用IRRCalc时,所有内容都被放到一个数组中,而不是被id分区。如果你在下面运行,你会看到我的意思:

 select 
    array(select cash_flow from input), 
    array(select date_delta from input)
 from input

而不是IRRCalc(array(select cash_flow from input), array(select date_delta from input))。有人可以看看,让我知道如何在两个数组cash_flow和date_delta上应用partition by id逻辑,然后将其传递给函数IRRCalc?

1 个答案:

答案 0 :(得分:2)

以下是您正在寻找的最外部选择陈述

SELECT 
  id, 
  IRRCalc(ARRAY_AGG(cash_flow), ARRAY_AGG(date_delta)) AS IRR
FROM input
GROUP BY id 

它按ID分组并形成传递给UDF的相应数组 - 因此结果是特定于id的 假设WITH input AS的逻辑是正确的 - 你应该得到预期的结果