“透视”使用Hadoop的表格

时间:2014-04-26 22:05:15

标签: java hadoop mapreduce

(免责声明:我是Hadoop和Java的新手)

作为输入,有一个具有简单键值结构的表:

key1  value1
key2  value2
key3  value3
key2  value4
key1  value5
key1  value6

作为输出,我想为每个键收集属于特定键的所有值,如下所示:

key1, value1 value5 value6
key2, value2 value4
key3, value3

这是我的映射器:

public class WordMapper extends Mapper<Object, Text, Text, Text> {

 @Override
 public void map(Object key, Text value,
   Context context) throws IOException, InterruptedException {

    String[] fields = value.toString().split("\\t", -1); 
    for (int i = 0; i < fields.length; ++i) {
        if ("".equals(fields[i])) fields[i] = null;
    }
    List<String> fields_list = Arrays.asList(fields);
    Text textKey = new Text(fields_list.get(0));
    Text textValue = new Text(fields_list.get(1));
    context.write(textKey,textValue);
    }
 }

这是减速器:

public class SumReducer extends Reducer<Text, TextArrayWritable, Text, TextArrayWritable> {
    private TextArrayWritable valuesTotal = new TextArrayWritable();

    public void reduce(Text key, Iterable<Text> values, Context context)
                throws IOException, InterruptedException {
        ArrayList<Text> values_list = new ArrayList<Text>();

        for (Text value : values) {
             values_list.add(value);
    }
        Text[] values_arr = new Text[values_list.size()];
        values_arr = values_list.toArray(values_arr);

         valuesTotal.setFields(values_arr);
         context.write(key, valuesTotal);
}
}

出于某种原因,我无法从我的程序中获得任何输出。它只是终止,在输出文件夹中什么都不留。我的问题在这里是什么?

(我使用Hadoop 2.2.0和Eclipse + hadoop插件.WordCount示例运行没有问题。)

1 个答案:

答案 0 :(得分:1)

问题解决了。在我启用日志记录后,很明显我的数据包含第4列中缺少值的行,所以我添加了空检查if (fields[4] != null)并且它有效。此外,我摆脱数组列出转换和TextArrayWritable自定义类

的使用

Mapper:

@Override
 public void map(Object key, Text value,
   Context context) throws IOException, InterruptedException {

    String[] fields = value.toString().split("\\t", -1); 
    for (int i = 0; i < fields.length; ++i) {
        if ("".equals(fields[i])) fields[i] = null;
    }
    if (fields[4] != null) {
    System.out.println(fields[0]);
    System.out.println(fields[4]);
    context.write(new Text(fields[0]),new Text(fields[4]));
    }
    }
}

减速机:

public class SongsReducer extends Reducer<Text, Text, Text, Text> { 
    public void reduce(Text key, Iterable<Text> values, Context context)
                throws IOException, InterruptedException {
        boolean first = true;
        StringBuilder songs = new StringBuilder();;
        for (Text val : values){
              if (!first)
                songs.append(",");
              first=false;
              songs.append(val.toString());
            }

        context.write(key, new Text(songs.toString()));
}
}