无法从MapReduce作业中获得我想要的结果

时间:2015-04-27 11:44:40

标签: hadoop mapreduce

这是我数据的样本

enter image description here

如果第一列是索引0,我想使用MapReduce从此文件获取每个商店的总销售额,商店名称位于索引2且收入位于索引4

这是我的Mapper代码

  

public void map(LongWritable key , Text value , Context context)
throws IOException , InterruptedException
{
    String line = value.toString();
    String[] columns = line.split("\t");

    if(columns.length == 6)
    {
        String storeNameString = columns[2];
        Text storeName = new Text(storeNameString);

        String storeRevenueString = columns[4];
        IntWritable storeRevenue = new IntWritable(Integer.parseInt(storeRevenueString));
        context.write(storeName, storeRevenue);
    }   
}

这是我的减速机代码

  

public void reduce(Text key, Iterable<IntWritable> values, Context context)
        throws IOException , InterruptedException {

    Text storeName = key;
    int storeSales = 0;

    while(values.iterator().hasNext())
    {
        storeSales += values.iterator().next().get();

    }
    context.write(storeName, new IntWritable(storeSales));
}

这是运行作业的代码

public class StoreSales extends Configured implements Tool {

public static void main(String[] args) throws Exception {
    // this main function will call run method defined above.
    int res = ToolRunner.run(new StoreSales(),args);
    System.exit(res);
}

@Override
public int run(String[] args) throws Exception {
    // TODO Auto-generated method stub
    JobConf conf = new JobConf();

    @SuppressWarnings("unused")
    Job job = new Job(conf , "Sales Per Store");

    job.setMapperClass(StoreSalesMapper.class);
    job.setReducerClass(StoreSalesReducer.class);
    job.setJarByClass(StoreSales.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    Path input = new Path(args[0]);
    Path output = new Path(args[1]);

    FileInputFormat.addInputPath(conf , input);
    FileOutputFormat.setOutputPath(conf, output);

    JobClient.runJob(conf);

    return 0;
    }
 }

这是结果应该如何的样本 enter image description here

这是我得到的结果 enter image description here

我做错了什么?

2 个答案:

答案 0 :(得分:1)

您的逻辑没有任何问题,我使用新的map reduce api在驱动程序中使用了您的逻辑和修改位:

映射器部分

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class Map extends Mapper<LongWritable,Text,Text,IntWritable>{


    public void map(LongWritable key , Text value , Context context)
            throws IOException , InterruptedException
            {
                String line = value.toString();
                String[] columns = line.split("\\t");

                if(columns.length == 6)
                {
                    String storeNameString = columns[2];
                    Text storeName = new Text(storeNameString);

                    String storeRevenueString = columns[4];
                    IntWritable storeRevenue = new IntWritable(Integer.parseInt(storeRevenueString));
                    context.write(storeName, storeRevenue);
                }   
            }
}

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class Reduce extends Reducer<Text,IntWritable,Text,IntWritable>{

    public void reduce(Text key, Iterable<IntWritable> values, Context context)
            throws IOException , InterruptedException {

        Text storeName = key;
        int storeSales = 0;

        while(values.iterator().hasNext())
        {
            storeSales += values.iterator().next().get();

        }
        context.write(storeName, new IntWritable(storeSales));
    }

}


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Driver {

public static void main(String[] args) throws Exception {
    // this main function will call run method defined above.

    // TODO Auto-generated method stub
    Configuration conf=new Configuration();
    @SuppressWarnings("unused")
    Job job = new Job(conf , "Sales Per Store");

    job.setMapperClass(Map.class);
    job.setReducerClass(Reduce.class);
    job.setJarByClass(Driver.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    job.waitForCompletion(true);


    }
 }

示例输入文件:

2012-01-01 09.00 sanJose clothin 214 amex

2012-01-01 09.00西雅图音乐320大师

2012-01-01 09.00西雅图elec 3120大师

2012-01-01 09.00 sanJose香水3200 amex

输出文件:

cat test123 / part-r-00000

sanJose 3414

西雅图3440

答案 1 :(得分:0)

我相信我在这里找到了问题。使用line.split方法时,您不正确地转义了制表符。这是因为String.split方法将其输入解释为正则表达式。使用正则表达式时,在使用\\t时,指定制表符的正确方法是\t。这是因为必须转义反斜杠本身。请注意,您缺少\个字符。

纠正分裂条件

String[] columns = line.split("\\t");