Question

customer_id, server_id, code
12342344, 1232, 3
12345456, 1433, 2
16345436, 2343, 4
12245456, 1434, 3
11145456, 1436, 2

如果我在hive上运行此查询：

select * from table where code=3;

在mapreduce中编写的代码是什么，我在哪里可以找到它？

换句话说，如何编写mapreduce作业以提供相同的查询结果？

由于

Answer 1

您可以从配置单元运行EXPLAIN select * from table where code=3;以查看配置单元的查询执行计划，但是配置单元不会为任何查询输出任何mapreduce代码。还要检查YSmart-Another SQL-to-MapReduce Translator - 查看SQL的mapreduce作业 - 我从未使用它，但你可以尝试一下：

这是示例，您可以尝试修改其他查询以获取想法。基本上，您将过滤map类中的数据并在reduce类中运行一些聚合。但是在这个例子中，你不需要减少类，因为它是从数据中简单映射的。你需要首先通过wordcount（hadooop的helloworld示例）来学习如何运行这个程序以及在哪里找到输出等。

输出：

12245456, 1434, 3   
12342344, 1232, 3

代码：

public class customers{

    public static class customersMapper extends Mapper<Object, Text, Text, Text> {

        @Override
        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {

            //this code should be converted to Int - try it for your self
            String code = value.toString().split(",")[2].trim();

            //this line should be chnaged if code variable is converted to Int 
            if (code.equals("3")){ // only map rec where code=3
                context.write(value,new Text(""));
            }       
        }   
    }

    public static void main(String[] args) throws Exception {

        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "get-customer-with-code=3");

        job.setJarByClass(customers.class);
        job.setMapperClass(customersMapper.class); 
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        job.waitForCompletion(true);
    } 
}

编写mapreduce代码来搜索模式

1 个答案: