Question

在我的hadoop采访中，这是一个问我的问题。我有如下表格数据。

我骑了一辆新自行车第一天，我走了20公里的距离第二天，仪表读数为50（第1天+第2天）第3天，仪表读数为60（第1天+第2天+第3天）

Day Distance
1    20
2    50
3    60

现在的问题是，我希望输出如下所示

Day  Distance
1    20
2    30
3    10

即我希望距离仅在第1天，第2天和第3天进行。

答案可以在Hive / Pig / MapReduce中。

谢谢

Answer 1

这是一个像问题一样的运行总计，你可以通过这个Hive查询来解决它

with b as (
select 0 as d, 0 as dst
union all 
select d, dst from mytable
)
SELECT a.d, a.km-b.km new_dst from mytable a, b 
where a.d-b.d==1

Answer 2

您可以使用Hive的内置窗口和分析功能来获得所需的结果。

这是一种方法。

SELECT day, NVL(CAST(distance-LAG(distance) OVER (ORDER BY day) AS INT),20) 
FROM table;

Answer 3

我在地图中尝试过减少。包hadoop;

public class distance {

public static class disMapper extends Mapper<LongWritable,Text,IntWritable,IntWritable>
{
    //1 20
    int pValue=0;
    IntWritable outkey=new IntWritable();
    IntWritable outvalue=new IntWritable();
    public void map(LongWritable key,Text values,Context context) throws IOException, InterruptedException
    {
        String cols[]=values.toString().split("\t");
        int dis=Integer.parseInt(cols[1])-pValue;
        outkey.set(Integer.parseInt(cols[0]));
        outvalue.set(dis);
        pValue=Integer.parseInt(cols[1]);
        context.write(outkey, outvalue);
    }
}

public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
    // TODO Auto-generated method stub
    Configuration conf=new Configuration();
    Job job =new Job(conf,"dfdeff");
    job.setJarByClass(distance.class);
    job.setMapperClass(disMapper.class);

    job.setMapOutputKeyClass(IntWritable.class);
    job.setMapOutputValueClass(IntWritable.class);

    job.setNumReduceTasks(0);

    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    System.exit(job.waitForCompletion(true)?1:0);

}

}

Hadoop面试查询-Mapreduce-Pig-Hive

3 个答案: