我正在尝试建模SQL查询,例如从map中选择distinct(col1),其中col2 = value2 在map reduce中。我使用的逻辑是每个映射器将检查where子句,如果找到匹配,它将where子句值作为键发出,col1作为值发出。基于默认哈希函数,所有输出将与where子句中的键使用值一起使用相同的reducer。在reducer中,我可以排除重复并发出不同的值。这是正确的做法吗?
这是实施此方法的正确方法吗?
注意:此查询的数据位于CSV文件中。
答案 0 :(得分:0)
//MAPPER pseudo code
public static class DistinctMapper extends Mapper<Object, Text, Text, NullWritable> {
private Text col1 = new Text();
private Text col2 = new Text();
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
// Logic to extract columns
String C1 = extractColumn(value);
String C2 = extractColumn(value);
if (C2 != 'WhereCluaseValue') { // filter value
return;
}
// Mapper output key to the distinct column value
col1.set(C1);
// Mapper value as NULL
context.write(col1, NullWritable.get());
}
}
//REDUCER pseudo code
public static class DistinctReducer extends Reducer<Text, NullWritable, Text, NullWritable> {
public void reduce(Text key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException {
// distinct column with a null value
//Here we are not concerned about the list of values
context.write(key, NullWritable.get());
}
}