I have an UDF written in Java which propagates last non null value through rows ordered by row_number only if actual value is 9. Those values can make distinction between different components.
For example:
Row number | Component | Value
---------------------------------
1 1 3
2 1 4
3 1 NULL
4 1 NULL
5 2 3
6 2 9
7 1 9
8 1 5
9 2 6
10 1 9
Should result in:
Row number | Component | Value
---------------------------------
1 1 3
2 1 4
3 1 NULL
4 1 NULL
5 2 3
6 2 3
7 1 4
8 1 5
9 2 6
10 1 5
In order to save last non null value, i set a global variable in the UDF, which would be in charge of distributing the last registered value:
HashMap<String, String> hmapS = new HashMap<String, String>();
First i order the rows, then i use the UDF:
select my_udf(component,value) as propagated_value
from(
select row_number,component, value
order by row_number
limit 99999999 -- Need this so that impala orders rows
)a
Problem is that the order is not respected by 'hmapS'.
In the example above, i could sometimes get:
Row number | Component | Value
---------------------------------
1 1 3
2 1 4
3 1 NULL
4 1 NULL
5 2 3
6 2 6
7 1 3
8 1 5
9 2 6
10 1 3
It looks like a race condition, and that a java udf does not really respect the 'order by row_number'.
How could i make it respect it?
This would be the UDF code, in case it helps:
@UDFType(deterministic = true, stateful = false)
public class PropVarUT
extends UDF
{
HashMap<String, String> hmapS = new HashMap<String, String>();
// Only propagate when value is 9
public String evaluate(String component, String value)
{
String output = null;
if(value !=null)
{
if (value.equals("9"))
{
output = hmapS.get(ut);
}
else
{
hmapS.put(component, value);
output = value;
}
}
return output;
}
}