mapreduce读取hive表并使用上下文写入hdfs位置

时间:2018-03-11 12:00:55

标签: java hadoop mapreduce hcatalog

我正在寻找mapreduce程序从一个hive表读取并写入每个记录的第一列值的hdfs位置。它应该只包含地图阶段而不是减速阶段。

以下是映射器

public class Map extends Mapper<WritableComparable, HCatRecord, NullWritable, IntWritable> {

      protected void map( WritableComparable key,
                        HCatRecord value,
                        org.apache.hadoop.mapreduce.Mapper<WritableComparable, HCatRecord,
                        NullWritable, IntWritable>.Context context)
          throws IOException, InterruptedException {
          // The group table from /etc/group has name, 'x', id
        //  groupname = (String) value.get(0);
          int id = (Integer) value.get(1);
          // Just select and emit the name and ID
          context.write(null, new IntWritable(id));
      }
  }

主要课程

public class mapper1   {


    public static void main(String[] args) throws Exception {
    mapper1 m=new mapper1();
        m.run(args);
    }

    public void run(String[] args) throws IOException, Exception, InterruptedException {
    Configuration conf =new  Configuration();


    // Get the input and output table names as arguments
    String inputTableName = args[0];

    // Assume the default database
    String dbName = "xademo";

    Job job = new Job(conf, "UseHCat");
    job.setJarByClass(mapper1.class);
    HCatInputFormat.setInput(job, dbName, inputTableName);

    job.setMapperClass(Map.class);

    // An HCatalog record as input
    job.setInputFormatClass(HCatInputFormat.class);

    // Mapper emits a string as key and an integer as value
    job.setMapOutputKeyClass(NullWritable.class);
    job.setMapOutputValueClass(IntWritable.class);


    FileOutputFormat.setOutputPath((JobConf) conf, new Path(args[1]));


    job.waitForCompletion(true);
    }
}

此代码有什么问题吗?

这是因为字符串5s中的Numberformat异常而产生一些错误。我不确定它是从哪里拿走的。在线下方显示错误HCatInputFormat.setInput()

0 个答案:

没有答案