在我的mapper类中,我想对从文件读取的字符串(作为一行)进行小操作,然后将其发送到reducer以获取字符串计数。操作是用0替换空字符串。(当前替换和加入部分失败了我的hadoop作业)
这是我的代码:
import java.io.BufferedReader;
import java.io.IOException;
.....
public class PartNumberMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
private static Text partString = new Text("");
private final static IntWritable count = new IntWritable(1);
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
String line = value.toString();
// Read line by line to bufferreader and output the (line,count) pair
BufferedReader bufReader = new BufferedReader(new StringReader(line));
String l=null;
while( (l=bufReader.readLine()) != null )
{
/**** This part is the problem ****/
String a[]=l.split(",");
if(a[1]==""){ // if a[1] i.e. second string is "" then set it to "0"
a[1]="0";
l = StringUtils.join(",", a); // join the string array to form a string
}
/**** problematic part ends ****/
partString.set(l);
output.collect(partString, count);
}
}
}
运行此操作后,映射器将失败并且不会发布任何错误。 [代码用纱线运行] 我不确定我做错了什么,相同的代码没有字符串连接部分。
你们有没有人解释字符串replace / concat有什么问题?有没有更好的方法呢?
答案 0 :(得分:1)
这是Mapper类的修改版本,只有一些更改:
.equals()
而不是==
String[]
而非String a[]
导致以下代码:
public class PartNumberMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
private Text partString = new Text();
private final static IntWritable count = new IntWritable(1);
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
String line = value.toString();
String[] a = l.split(",");
if (a[1].equals("")) {
a[1] = "0";
l = StringUtils.join(",", a);
}
partString.set(l);
output.collect(partString, count);
}
}