Flink如何使用从Avro输入数据推断出的架构创建表

时间:2019-01-28 08:50:34

标签: apache-flink flink-sql

我已经在Flink数据集中加载了一个Avro文件:

AvroInputFormat<GenericRecord> test = new AvroInputFormat<GenericRecord>(
        new Path("PathToAvroFile")
        , GenericRecord.class);
DataSet<GenericRecord> DS = env.createInput(test);

usersDS.print();

这是打印DS的结果:

{"N_NATIONKEY": 14, "N_NAME": "KENYA", "N_REGIONKEY": 0, "N_COMMENT": " pending excuses haggle furiously deposits. pending, express pinto beans wake fluffily past t"}
{"N_NATIONKEY": 15, "N_NAME": "MOROCCO", "N_REGIONKEY": 0, "N_COMMENT": "rns. blithely bold courts among the closely regular packages use furiously bold platelets?"}
{"N_NATIONKEY": 16, "N_NAME": "MOZAMBIQUE", "N_REGIONKEY": 0, "N_COMMENT": "s. ironic, unusual asymptotes wake blithely r"}
{"N_NATIONKEY": 17, "N_NAME": "PERU", "N_REGIONKEY": 1, "N_COMMENT": "platelets. blithely pending dependencies use fluffily across the even pinto beans. carefully silent accoun"}
{"N_NATIONKEY": 18, "N_NAME": "CHINA", "N_REGIONKEY": 2, "N_COMMENT": "c dependencies. furiously express notornis sleep slyly regular accounts. ideas sleep. depos"}
{"N_NATIONKEY": 19, "N_NAME": "ROMANIA", "N_REGIONKEY": 3, "N_COMMENT": "ular asymptotes are about the furious multipliers. express dependencies nag above the ironically ironic account"}
{"N_NATIONKEY": 20, "N_NAME": "SAUDI ARABIA", "N_REGIONKEY": 4, "N_COMMENT": "ts. silent requests haggle. closely express packages sleep across the blithely"}

现在,我想从DS数据集创建具有与Avro文件完全相同的架构的表,我的意思是列应该为N_NATIONKEY,N_NAME,N_REGIONKEY和N_COMMENT。

我知道使用这一行:

tableEnv.registerDataSet("tbTest", DS, "field1, field2, ...");

我可以创建一个表并设置列,但是我希望从数据中自动推断出列。可能吗? 另外,我尝试了

tableEnv.registerDataSet("tbTest", DS);

但它会创建一个具有以下模式的表:

root
 |-- f0: GenericType<org.apache.avro.generic.GenericRecord>

1 个答案:

答案 0 :(得分:1)

GenericRecord是Table&SQL API运行时的黑匣子,因为字段数及其数据类型未定义。我建议使用Avro生成的类扩展SpecificRecord。这些特定类型也可以被Flink的类型系统识别,并且您可以使用正确的数据类型来正确处理各个字段。

或者,您可以实现custom UDF来提取具有正确数据类型getAvroInt(f0, "myField")getAvroString(f0, "myField")等的字段。

一些伪代码:

class AvroStringFieldExtract extends ScalarFunction {
    public String eval(GenericRecord r, String fieldName) {
        return r.get(fieldName).toString();
    }
}

tableEnv.registerFunction("getAvroFieldString", new AvroStringFieldExtract())