必须对Java程序进行哪些绝对最小修改才能使其适合map-reduce?
这是我的Java程序:
import java.io.*;
class evmTest {
public static void main(String[] args) {
try {
Runtime rt = Runtime.getRuntime();
String command = "evm --debug --code 7f00000000000000000000000000000000000000000000000000000000000000027f00000000000000000000000000000000000000000000000000000000000000027f00000000000000000000000000000000000000000000000000000000000000020101 run";
Process proc = rt.exec(command);
BufferedReader stdInput = new BufferedReader(new
InputStreamReader(proc.getInputStream()));
BufferedReader stdError = new BufferedReader(new
InputStreamReader(proc.getErrorStream()));
// read the output from the command
System.out.println("Here is the standard output of the command:\n");
String s = null;
while ((s = stdInput.readLine()) != null) {
System.out.println(s);
}
// read any errors from the attempted command
System.out.println("Here is the standard error of the command (if any):\n");
while ((s = stdError.readLine()) != null) {
System.out.println(s);
}
} catch (IOException e) {
System.out.println(e);
}
}
}
它打印终端的输出,以这种方式呈现:
Here is the standard output of the command:
0x
Here is the standard error of the command (if any):
#### TRACE ####
PUSH32 pc=00000000 gas=10000000000 cost=3
PUSH32 pc=00000033 gas=9999999997 cost=3
Stack:
00000000 0000000000000000000000000000000000000000000000000000000000000002
PUSH32 pc=00000066 gas=9999999994 cost=3
Stack:
00000000 0000000000000000000000000000000000000000000000000000000000000002
00000001 0000000000000000000000000000000000000000000000000000000000000002
ADD pc=00000099 gas=9999999991 cost=3
Stack:
00000000 0000000000000000000000000000000000000000000000000000000000000002
00000001 0000000000000000000000000000000000000000000000000000000000000002
00000002 0000000000000000000000000000000000000000000000000000000000000002
ADD pc=00000100 gas=9999999988 cost=3
Stack:
00000000 0000000000000000000000000000000000000000000000000000000000000004
00000001 0000000000000000000000000000000000000000000000000000000000000002
STOP pc=00000101 gas=9999999985 cost=0
Stack:
00000000 0000000000000000000000000000000000000000000000000000000000000006
#### LOGS ####
当然,这是Apache示例中最简单的map-reduce作业之一:
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
我的问题是 - 在这篇文章的顶部,我最简单的映射缩减方法是什么?
更新
使用此命令:
$HADOOP_HOME/bin/hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.8.1.jar -D mapreduce.job.reduces=0 -input /input_0 -output /steaming-output -mapper ./mapper.sh
导致此错误:
开始遇到问题:
17/09/26 03:26:56 INFO mapreduce.Job: Task Id : attempt_1506277206531_0004_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
服务器提供的其他信息:
答案 0 :(得分:2)
所以,这不是试图给你一个解决方案,而是推动你应该走的方向。
如上所述,要先获得一些工作。
假设您有一些这样的文件放在hdfs:///input/codes.txt
7f0000000002812
7f000000000281a
7f000000000281b
7f000000000281c
非常“简单”的WordCount代码实际上可以处理这些数据!但是,显然你不需要计算任何东西,你甚至不需要减速器。你有一个只有地图的工作,会开始这样的事情。
private final Runtime rt = Runtime.getRuntime();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
String command = "evm --debug --code " + value.toString() + " run";
Process proc = rt.exec(command);
context.write( ... some_key, some_value ...);
}
但是,你真的根本不需要Java。你有一个shell命令,所以你可以使用Hadoop Streaming来运行它,并将代码从HDFS“流”到你的脚本stdin
。
该映射器看起来像这样。
#!/bin/bash
### mapper.sh
while read code; do
evm --debug --code $code run
done
你甚至可以在没有Hadoop的情况下测试代码本地(如果你真的需要Hadoop的开销,你应该尝试做基准测试)
mapper.sh < codes.txt
由您决定,哪个选项效果最佳...对于极简主义者,Hadoop流看起来更简单。
$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming*.jar \
-D mapreduce.job.reduces=0 \
-input /input \
-output /tmp/steaming-output \
-mapper ~/mapper.sh
另外值得一提的是 - 任何标准输出/标准错误都将收集到YARN应用程序日志中,而不必返回HDFS。