为什么Hbase在运行mapreduce作业时将大文件打包在jar中?

时间:2018-07-09 13:51:35

标签: java hadoop hbase

我要连续执行两项工作:

public static Job configureJob1(Configuration conf, String [] args) throws IOException {
   String tableName = args[0];
   conf.set(TableInputFormat.SCAN, TableMapReduceUtil.convertScanToString(new Scan()));
   conf.set(TableInputFormat.INPUT_TABLE, tableName);

   Job job = new Job(conf, "job1");
   job.setJarByClass(EPE.class);
   job.setInputFormatClass(TableInputFormat.class);
   job.setMapperClass(Map1.class);
   job.setMapOutputKeyClass(DCSWritable.class);
   job.setMapOutputValueClass(DoubleWritable.class);

   TableMapReduceUtil.initTableReducerJob(
        "tmp",        // output table
        DoubleSumReducer.class,    // reducer class
      job
   );
   job.setNumReduceTasks(1);
   return job;
 }

 public static Job configureJob2(Configuration conf, String [] args) throws IOException {
   Scan scan = new Scan();
   scan.setCaching(500);
   Configuration config = HBaseConfiguration.create();
   Job job = new Job(config,"Job2");
   //job.setJarByClass(EPE.class);
   TableMapReduceUtil.initTableMapperJob(
    "tmp",      // input table
    scan,             // Scan instance to control CF and attribute selection
    Map2.class,   // mapper class
    DCWritable.class,             // mapper output key
    DoubleWritable.class,             // mapper output value
    job);

   TableMapReduceUtil.initTableReducerJob(
        "res",        // output table
        DoubleSumReducer.class,    // reducer class
      job
   );
   job.setNumReduceTasks(1);
   return job;
 }

 public int run(String[] args) throws Exception {
   Configuration conf = HBaseConfiguration.create(getConf());
   if(args.length < 1) {
     System.err.println("****Only " + args.length + " argument supplied, required: 1");
     System.err.println("Usage: IndexBuilder <TABLE_NAME>");
     System.exit(-1);
   }
   Job job1 = configureJob1(conf, args);
   if (job1.waitForCompletion(true)){
     Job job2 = configureJob2(conf, args);
     return (job2.waitForCompletion(true) ? 0 : 1);
   }
   return 1;
 }

它可以毫无问题地进行编译,bu在运行时,第一个作业可以,但是第二个作业没有启动,并且目标目录正在增长并且没有停止增长。现在是15GO!我的父过滤器中有很大的csv,看起来hbase正在尝试打包它!

0 个答案:

没有答案