Apache Flink:使用TableFunction的LEFT JOIN不会返回预期的结果

时间:2017-09-27 06:39:13

标签: left-join apache-flink flink-sql

Flink版本:1.3.1

我创建了两个表,一个来自内存,另一个来自UDTF。当我测试join并离开join时,他们返回了相同的结果。我所期望的是左连接比连接有更多的行。

我的测试代码是:

public class ExerciseUDF {
        public static void main(String[] args) throws Exception {
            test_3();
        }
        public static void test_3() throws Exception {
                // 1. set up execution environment
                ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
                BatchTableEnvironment tEnv = TableEnvironment.getTableEnvironment(env);

                DataSet<WC> input = env.fromElements(
                        new WC("Hello", 1),
                        new WC("Ciao", 1),
                        new WC("Hello", 1));

                // 2. register the DataSet as table "WordCount"
                tEnv.registerDataSet("WordCount", input, "word, frequency");

                Table table;
                DataSet<WC> result;
                        DataSet<WCUpper> resultUpper;
                table = tEnv.scan("WordCount");
                // 3. table left join user defined table
                System.out.println("table left join user defined table");
                tEnv.registerFunction("myTableUpperFunc",new MyTableFunc_2());
                table = tEnv.sql("SELECT S.word as word, S.frequency as frequency, S.word as myupper FROM WordCount as S left join LATERAL TABLE(myTableUpperFunc(S.word)) as T(word,myupper) on S.word = T.word");
                resultUpper = tEnv.toDataSet(table, WCUpper.class);
                resultUpper.print(); // out put —— WCUpper Ciao 1 CIAO, however, without the row having Hello

                // 4. table join user defined table
                System.out.println("table join user defined table");
                tEnv.registerFunction("myTableUpperFunc",new MyTableFunc_2());
                table = tEnv.scan("WordCount");
                table = tEnv.sql("SELECT S.word as word, S.frequency as frequency, T.myupper as myupper FROM WordCount as S join LATERAL TABLE(myTableUpperFunc(S.word)) as T(word,myupper) on S.word = T.word"
                );
                resultUpper = tEnv.toDataSet(table, WCUpper.class);
                resultUpper.print();
            }

            public static class WC {
                public String word;
                public long frequency;

                // public constructor to make it a Flink POJO
                public WC() {
                }

                public WC(String word, long frequency) {
                    this.word = word;
                    this.frequency = frequency;
                }

                @Override
                public String toString() {
                    return "WC " + word + " " + frequency;
                }
            }


            // user defined table function
            public static class MyTableFunc_2 extends TableFunction<Tuple2<String,String>>{
                public void eval(String str){ // hello --> hello HELLO
                    System.out.println("upper func executed for "+str);
                    if(str.equals("Hello")){
                        return;
                    }
                    collect(new Tuple2<String,String>(str,str.toUpperCase()));
                    // collect(new Tuple2<String,String>(str,str.toUpperCase()));
                }
            }
    }

左连接和连接查询的输出是相同的。在这两种情况下,只返回一行。

WCUpper Ciao 1 CIAO

但是,我认为左连接查询应该保留'Hello'行。

1 个答案:

答案 0 :(得分:0)

是的,你是对的。

这是使用谓词转换TableFunction外连接时的错误,需要修复。

谢谢Fabian