Question

我想从我的map reduce作业中读取一个托管的Hive表数据。我有一个托管的Hive表，它是从另一个从外部hive表创建的表创建的。我想在我最终管理的Hive表上运行map reduce作业。我读过托管表有一个默认为“char 1”ASCII字符的分隔符。所以我这样做了：

public static final String SEPARATOR_FIELD = new String(new char[] {1});

后来我在循环中做了这个：

end = rowTextObject.find(SEPARATOR_FIELD, start);

但是当我运行map reduce jar时，我在上面的行和下面给出的行中得到了Illegal Argument异常：

public void map(LongWritable key, Text rowTextObject, Context context) throws IOException, InterruptedException

PS：我在github上查找了一个项目，用于在mapreduce作业中阅读托管的hive表，但我无法理解@ https://github.com/facebook/hive-io-experimental。

Answer 1

假设我输入文件如下（比如xyz.txt）： - 111 \ 001 222
121 \ 001 222
131 \ 001 222
141 \ 001 222
151 \ 001 222
161 \ 001 222
171 \ 001 222
现在 \ 001 是我的hive默认分隔符（比如说）。
现在为了使用map reduce解析已经加载到hive表的这个文件，我会做类似的事情这在我的地图方法中： -

public class MyMapper extends Mapper<LongWritable, Text, Text, Text>{
    public void map(LongWritable key, Text value,Context context) throws java.io.IOException ,InterruptedException
    {

        String[]vals=value.toString().split("\\001");
        context.write(new Text(vals[0]),new Text("1"));
     }

}

您的驱动程序方法将是正常的，如下所示： -

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(MyMapper.class);
FileInputFormat.addInputPath(job, new Path(xyz.txt));

根据我给出的地图方法，最终输出如下： -
111 1
121 1
131 1
141 1
151 1
161 1
171 1
这就像你在我的地图方法中所做的解析一样吗？

从Java Map Reduce代码中读取Hive托管表

1 个答案: