
时间:2013-09-23 19:41:21

标签: hadoop mapreduce


1 个答案:

答案 0 :(得分:0)


Please check this link, Java has this inbuilt.


如果希望UUID只生成分割中所有记录的一次,则可以覆盖Mapper类的设置方法,该方法在map任务的开头只调用一次。 然后可以将生成的UUID存储在变量中,以用于map()函数中的每个记录。

如果您使用 mapreduce API,请按以下步骤操作 -

public static class SampleMapper extends
            Mapper<LongWritable, Text, Text, Text> {

   String uuid;

    * This method will be called once at the beginning
    * of each map task
    protected void setup(Context context) throws IOException,
            InterruptedException {
        //generate your uuid here
        uuid = generateUUID();

    protected void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException {

        //use uuid here


如果是 mapred API,请按以下步骤操作 -

public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { 

     String uuid;

     public void configure(JobConf job) {
         uuid = gernerateUUID();

     public void map(LongWritable key, Text value, 
        OutputCollector<Text, IntWritable> output, Reporter reporter)
              throws IOException { 

          //use uuid here


Here is the link.