如何使用Apache Beam从Hive读取?

时间:2017-05-22 23:28:36

标签: hive apache-beam apache-beam-io

如何使用Apache Beam从Hive读取/如何在Apache Beam中使用Hive作为源?

2 个答案:

答案 0 :(得分:0)

HadoopInputFormatIO可用于从Hive读取,如下所示:

Configuration conf = new Configuration();
conf.setClass("mapreduce.job.inputformat.class", HCatInputFormat.class, 
InputFormat.class);
conf.setClass("key.class", LongWritable.class, WritableComparable.class);
conf.setClass("value.class", DefaultHCatRecord.class, Writable.class);
conf.set("hive.metastore.uris", "...");
HCatInputFormat.setInput(hiveConf, "myDatabase", "myTable", "myFilter");


PCollection<KV<LongWritable, DefaultHCatRecord>> data =
p.apply(HadoopInputFormatIO.<Long, 
DefaultHCatRecord>read().withConfiguration(conf));

答案 1 :(得分:0)

2017年7月合并的提款请求允许Beam 2.1.0通过hive https://issues.apache.org/jira/browse/BEAM-2357支持HCatalog