Question

我正在尝试做一个chaning Job。

所以在某种程度上我想访问args（public static void main(String[] args)）。

在mapper中说args [0]。

有没有办法在mapper中访问这些值而不是将它们发送到函数和访问？ 替代解决方案

conf.set("args", args[1]);
job1.setJarByClass(BinningDriver.class);
FileSystem fs1 = FileSystem.get(conf);
job1.setOutputKeyClass(Text.class);
job1.setOutputValueClass(Text.class);
job1.setMapperClass(BinningInput.class);
job1.setInputFormatClass(TextInputFormat.class);
job1.setOutputFormatClass(TextOutputFormat.class);
Path out = new Path(args[1]+"/Indexing"); //Output goes to user output location/indexing
if(fs1.exists(out)){
    fs1.delete(out,true);
}

FileInputFormat.addInputPath(job1, new Path(args[0]));
FileOutputFormat.setOutputPath(job1, out);
}

映射

public void setup(Context context){
Configuration conf = context.getConfiguration();
String param = conf.get("args");
System.out.println("args:"+param);
    }

本作品

Answer 1

Args []是Driver类主函数的输入参数。访问此参数的唯一方法是在Driver中（此参数的范围仅是main函数）。因此，如果要将这些传递给映射器，则需要将它们作为参数传递（例如，将此信息添加到分布式缓存中，并从映射器的配置中获取）。

如果您只是想传递一些参数，请检查this article，并将“123”替换为args [2]，或者您感兴趣的任何arg。

如果要传递整个文件进行处理，请执行以下操作：

示例：

Driver类中的main方法：

public static void main(String[] args) {
    ... 
    FileInputFormat.setInputPaths(conf, new Path(args[0]));     
    FileOutputFormat.setOutputPath(conf, new Path(args[1]));
    ...
    try {
        DistributedCache.addCacheFile(new URI(args[2]), conf);          
    } catch (URISyntaxException e) {
        System.err.println(e.toString());
    }
    ....
}

在Mapper中，在map（）方法之前，定义configure方法（我使用的是hadoop 1.2.0）：

Set<String> lines;
public void configure(JobConf job){
    lines = new HashSet<>();

    BufferedReader SW;
    try {
        localFiles = DistributedCache.getLocalCacheFiles(job);      
        SW = new BufferedReader(new FileReader(localFiles[0].toString()));
        lines.add(SW.readLine());            
        SW.close();
    } catch (FileNotFoundException e) {
        System.err.println(e.toString());
    } catch (IOException e) {
        System.err.println(e.toString());
    }       
}

有关如何使用分布式缓存的更多信息，请参阅API： http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/filecache/DistributedCache.html

访问MapReduce中的args [0]值

1 个答案: