Question

我对通过参数搜索的代码存在疑问。

context.getConfiguration().get("Uid2Search");

的含义是什么？

package SearchTxnByArg;

// This is the Mapper Program for SearchTxnByArg
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class MyMap extends Mapper<LongWritable, Text, NullWritable, Text>{

        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

            String Txn = value.toString();
            String TxnParts[] = Txn.split(",");
            String Uid = TxnParts[2];
            String Uid2Search = context.getConfiguration().get("Uid2Search");
            if(Uid.equals(Uid2Search))
            {
                context.write(null, value); 
            }           
        }
    }

驱动程序

package SearchTxnByArg;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MyDriver {
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        conf.set("Uid2Search", args[0]);        
        Job job = new Job(conf, "Map Reduce Search Txn by Arg");
        job.setJarByClass(MyDriver.class);
        job.setMapperClass(MyMap.class);
        job.setMapOutputKeyClass(NullWritable.class);
        job.setMapOutputValueClass(Text.class);
        job.setNumReduceTasks(0);
        FileInputFormat.addInputPath(job, new Path(args[1]));
        FileOutputFormat.setOutputPath(job, new Path(args[2]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }

}

Answer 1

我不知道你是如何编写驱动程序的。但根据我的经验，如果您尝试通过命令行中的-D选项或默认情况下使用User方法获取系统属性，则这些值将设置为上下文配置。

As per documentation,

配置由资源指定。资源包含一个集合   名称/值对作为XML数据。每个资源都由a命名   字符串或路径。如果用String命名，则类路径为   检查具有该名称的文件。如果由Path命名，则为本地   直接检查文件系统，而不引用类路径。

除非明确禁用，否则Hadoop默认指定两个   资源，从类路径按顺序加载：core-default.xml：   hadoop的只读默认值。 core-site.xml：特定于站点   给定hadoop安装的配置。应用可能会添加   其他资源，在这些资源之后加载   按顺序添加。

Please see this answer as well

Context对象：允许Mapper / Reducer与Hadoop系统的其余部分进行交互。它包括作业的配置数据以及允许它发出输出的接口。

应用程序可以使用上下文：

报告进度
设置应用程序级状态消息
更新计数器
表明他们还活着
以获取跨地图/缩减阶段存储在作业配置中的值。

hadoop中context.getconfiguration的含义

1 个答案: