Question

我是新的两个linux，对于我的项目，我们正在使用hadoop.Now我们编写了3个mapreduce程序，这样第一个程序的输出输入到第二个程序，第二个程序的输出输入到第三个程序。但是我们正在运行3种不同的conf意味着首先我们运行第一个程序的配置然后第2个和第3个之后。现在我们想要两个运行所有3个程序一个接一个是可能在linux中使用cron作为如果是的话请提step.We想要两个使用cron作业，因为我们需要两个运行3个程序重复几个小时

Answer 1

1。使用＆amp;＆amp; 创建一个shell脚本，以便按顺序执行您的hadoop程序。执行第一个命令然后使用&&然后使用第二个命令，依此类推。

例如：first command && second command && third command

2. 在终端输入：

crontab -e

这将在终端中打开cronjob编辑器。

添加此行以每15分钟运行一次shell脚本

*/15 * * * * /path/to/your/shell/script

有关crontab的更多帮助，请参阅https://help.ubuntu.com/community/CronHowto

DELETE / COPY OUTPUT DIRECTORY：

如果要避免目录已存在错误，请在执行hadoop顺序作业之前删除或复制输出目录。在hadoop作业命令之前在shell脚本中添加它：

# Delete the output directory in HDFS
hadoop fs -rmr /your/hdfs/output/directory/to/be/deleted
# Copy the output directory from HDFS to HDFS
hadoop fs -mkdir /new/hdfs/location
hadoop fs -cp /your/hdfs/output/directory/to/be/copied/*.* /new/hdfs/location
# Copy from HDFS to local filesystem
sudo mkdir /path/to/local/filesystem
hadoop fs -copyToLocal /your/hdfs/output/directory/to/be/copied/*.* /path/to/local/filesystem

注意：如果您使用的是最新的hadoop版本，请将 hadoop fs 替换为 hdfs dfs 和 -rmr 使用 -rm -r 。不要忘记在复制目录时添加“*。*”，因为这将复制该目录的所有内容。根据您的配置更改HDFS文件路径。

Answer 2

处理此案例的最佳方法是使用链mapreduce方法。

http://gandhigeet.blogspot.in/2012/12/as-discussed-in-previous-post-hadoop.html

我发布了调用三个mapreduce作业的驱动程序代码..

 public class ExerciseDriver {


static Configuration conf;

public static void main(String[] args) throws Exception {

    ExerciseDriver ED = new ExerciseDriver();
    conf = new Configuration();
    FileSystem fs = FileSystem.get(conf);

    if(args.length < 2) {
        System.out.println("Too few arguments. Arguments should be:  <hdfs input folder> <hdfs output folder> ");
        System.exit(0);
    }

    String pathin1 = args[0];
    String pathout1 = args[1];


     //Run first Map reduce job
    fs.delete(new Path(pathout1+"_1"), true);

    ED.runFirstJob(pathin1, pathout1+"_1");

    ED.runSecondJob(pathout1+"_1", pathout1+"_2");

    ED.runThirdJob(pathout1+"_2", pathout1+"3");


}

  public int runFirstJob(String pathin, String pathout) throws Exception {

    Job job = new Job(conf);
    job.setJarByClass(ExerciseDriver.class);
    job.setMapperClass(ExerciseMapper1.class);
    job.setCombinerClass(ExerciseCombiner.class);
    job.setReducerClass(ExerciseReducer1.class);
    job.setInputFormatClass(ParagrapghInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class); 
    FileInputFormat.addInputPath(job, new Path(pathin));
    FileOutputFormat.setOutputPath(job, new Path(pathout));

   job.submit();  

   job.getMaxMapAttempts();


    boolean success = job.waitForCompletion(true);
    return success ? 0 : -1;

}

  public int runSecondJob(String pathin, String pathout) throws Exception { 
    Job job = new Job(conf);
    job.setJarByClass(ExerciseDriver.class);
    job.setMapperClass(ExerciseMapper2.class);
    job.setReducerClass(ExerciseReducer2.class);
    job.setInputFormatClass(KeyValueTextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);    
    FileInputFormat.addInputPath(job,new Path(pathin));
    FileOutputFormat.setOutputPath(job, new Path(pathout));
    boolean success = job.waitForCompletion(true);
    return success ? 0 : -1;
}

   public int runThirdJob(String pathin, String pathout) throws Exception { 
    Job job = new Job(conf);
    job.setJarByClass(ExerciseDriver.class);
    job.setMapperClass(ExerciseMapper3.class);
    job.setReducerClass(ExerciseReducer3.class);
    job.setInputFormatClass(KeyValueTextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);    
    FileInputFormat.addInputPath(job,new Path(pathin));
    FileOutputFormat.setOutputPath(job, new Path(pathout));
    boolean success = job.waitForCompletion(true);
    return success ? 0 : -1;
}

  }

在crontab中安排jar文件之后。或者你也可以使用oozie.as我们在驱动程序类中提到了3 mapreduce一个接一个地执行。第一个输出是第二个输入

Cron工作用于在linux中运行hadoop程序

2 个答案: