我要连续执行两项工作:
public static Job configureJob1(Configuration conf, String [] args) throws IOException {
String tableName = args[0];
conf.set(TableInputFormat.SCAN, TableMapReduceUtil.convertScanToString(new Scan()));
conf.set(TableInputFormat.INPUT_TABLE, tableName);
Job job = new Job(conf, "job1");
job.setJarByClass(EPE.class);
job.setInputFormatClass(TableInputFormat.class);
job.setMapperClass(Map1.class);
job.setMapOutputKeyClass(DCSWritable.class);
job.setMapOutputValueClass(DoubleWritable.class);
TableMapReduceUtil.initTableReducerJob(
"tmp", // output table
DoubleSumReducer.class, // reducer class
job
);
job.setNumReduceTasks(1);
return job;
}
public static Job configureJob2(Configuration conf, String [] args) throws IOException {
Scan scan = new Scan();
scan.setCaching(500);
Configuration config = HBaseConfiguration.create();
Job job = new Job(config,"Job2");
//job.setJarByClass(EPE.class);
TableMapReduceUtil.initTableMapperJob(
"tmp", // input table
scan, // Scan instance to control CF and attribute selection
Map2.class, // mapper class
DCWritable.class, // mapper output key
DoubleWritable.class, // mapper output value
job);
TableMapReduceUtil.initTableReducerJob(
"res", // output table
DoubleSumReducer.class, // reducer class
job
);
job.setNumReduceTasks(1);
return job;
}
public int run(String[] args) throws Exception {
Configuration conf = HBaseConfiguration.create(getConf());
if(args.length < 1) {
System.err.println("****Only " + args.length + " argument supplied, required: 1");
System.err.println("Usage: IndexBuilder <TABLE_NAME>");
System.exit(-1);
}
Job job1 = configureJob1(conf, args);
if (job1.waitForCompletion(true)){
Job job2 = configureJob2(conf, args);
return (job2.waitForCompletion(true) ? 0 : 1);
}
return 1;
}
它可以毫无问题地进行编译,bu在运行时,第一个作业可以,但是第二个作业没有启动,并且目标目录正在增长并且没有停止增长。现在是15GO!我的父过滤器中有很大的csv,看起来hbase正在尝试打包它!