我在Hadoop中编写了一个Map Reduce程序,用于散列文件的所有记录,并将hased值作为附加属性附加到每个记录,然后输出到Hadoop文件系统 这是我写的代码
public class HashByMapReduce
{
public static class LineMapper extends Mapper<Text, Text, Text, Text>
{
private Text word = new Text();
public void map(Text key, Text value, Context context) throws IOException, InterruptedException
{
key.set("single")
String line = value.toString();
word.set(line);
context.write(key, line);
}
}
public static class LineReducer
extends Reducer<Text,Text,Text,Text>
{
private Text result = new Text();
public void reduce(Text key, Iterable<Text> values,
Context context
) throws IOException, InterruptedException
{
String translations = "";
for (Text val : values)
{
translations = val.toString()+","+String.valueOf(hash64(val.toString())); //Point of Error
result.set(translations);
context.write(key, result);
}
}
}
public static void main(String[] args) throws Exception
{
Configuration conf = new Configuration();
Job job = new Job(conf, "Hashing");
job.setJarByClass(HashByMapReduce.class);
job.setMapperClass(LineMapper.class);
job.setReducerClass(LineReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setInputFormatClass(KeyValueTextInputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
我编写了这段代码,其逻辑是每一行都由Map方法读取,该方法将所有值分配给单个键,然后传递给相同的Reducer方法。其中每个值都传递给hash64()函数。
但是我看到它将一个空值(空值)传递给哈希函数。我无法弄明白为什么?提前致谢
答案 0 :(得分:2)
问题的原因很可能是由于KeyValueTextInputFormat
的使用。来自Yahoo tutorial:
InputFormat: Description: Key: Value:
TextInputFormat Default format; The byte offset The line contents
reads lines of of the line
text files
KeyValueInputFormat Parses lines Everything up to the The remainder of
into key, first tab character the line
val pairs
它打破了输入行wrt tab
字符。我想你的行中没有tab
。因此,key
中的LineMapper
是整行,而没有任何内容传递为value
(不确定null
或空)。
从您的代码中我认为最好使用TextInputFormat
类作为inputformat,它将行偏移量生成为key
,将整行生成为value
。这应该可以解决您的问题。
编辑:我按照以下更改运行您的代码,它似乎工作正常:
TextInputFormat
并相应地更改Mapper的声明setMapOutputKeyClass
&amp; setMapOutputValueClass
job
。这些不是强制性的,但通常会在运行时产生问题。ket.set("single")
并向Mapper添加了私有密钥。 hash64
方法的详细信息,因此我使用了String.toUpperCase
进行测试。如果问题仍然存在,那么我确信您的哈希方法没有很好地处理null
。
完整代码:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class HashByMapReduce {
public static class LineMapper extends
Mapper<LongWritable, Text, Text, Text> {
private Text word = new Text();
private Text outKey = new Text("single");
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
word.set(line);
context.write(outKey, word);
}
}
public static class LineReducer extends Reducer<Text, Text, Text, Text> {
private Text result = new Text();
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
String translations = "";
for (Text val : values) {
translations = val.toString() + ","
+ val.toString().toUpperCase(); // Point of Error
result.set(translations);
context.write(key, result);
}
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "Hashing");
job.setJarByClass(HashByMapReduce.class);
job.setMapperClass(LineMapper.class);
job.setReducerClass(LineReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setInputFormatClass(TextInputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}