How do global variables of UDFs written in Java act in Cloudera Impala?

时间:2019-04-08 12:47:18

标签: java hadoop user-defined-functions impala

I have an UDF written in Java which propagates last non null value through rows ordered by row_number only if actual value is 9. Those values can make distinction between different components.

For example:

Row number | Component | Value 
---------------------------------
    1           1          3
    2           1          4
    3           1          NULL
    4           1          NULL
    5           2          3
    6           2          9
    7           1          9
    8           1          5
    9           2          6
    10          1          9

Should result in:

 Row number | Component | Value 
---------------------------------
    1           1          3
    2           1          4
    3           1          NULL
    4           1          NULL
    5           2          3
    6           2          3
    7           1          4
    8           1          5
    9           2          6
    10          1          5

In order to save last non null value, i set a global variable in the UDF, which would be in charge of distributing the last registered value:

HashMap<String, String> hmapS = new HashMap<String, String>();

First i order the rows, then i use the UDF:

select my_udf(component,value) as propagated_value
from(
select row_number,component, value 
order by row_number
limit 99999999 -- Need this so that impala orders rows
)a 

Problem is that the order is not respected by 'hmapS'.

In the example above, i could sometimes get:

Row number | Component | Value 
---------------------------------
    1           1          3
    2           1          4
    3           1          NULL
    4           1          NULL
    5           2          3
    6           2          6
    7           1          3
    8           1          5
    9           2          6
    10          1          3

It looks like a race condition, and that a java udf does not really respect the 'order by row_number'.

How could i make it respect it?

This would be the UDF code, in case it helps:

@UDFType(deterministic = true, stateful = false)

public class PropVarUT
  extends UDF
{

    HashMap<String, String> hmapS = new HashMap<String, String>();

 // Only propagate when value is 9


  public String evaluate(String component, String value)
  {

    String output = null;

    if(value !=null)
    {
    if (value.equals("9"))
      {
        output = hmapS.get(ut);
      }
      else
      {
        hmapS.put(component, value);
        output = value;
      }
    }
    return output;
  }

}

0 个答案:

没有答案