我有一个Adcampaign驱动程序,映射器和减速器类。前两个课程运行得很好。 reducer类也运行正常,但结果不正确。这是我从互联网上下载到实践mapreduce程序的示例项目。
该计划的简要说明: 问题陈述:
以下是事件的顺序: 1.我们向用户提供广告 2.如果广告出现在用户浏览器上,则用户也会看到广告。我们将此事件视为VIEWED_EVENT 3.如果用户点击广告,我们会将此事件跟踪为CLICKED_EVENT
Input Log files format and description:
Log Files: The log files are in the following format:
times- tamp, user_id, view/click, domain, campaign_id.
E.g: 1262332801728, 899523, 1, npr.org, 19
◾timestamp : unix time stamp in milliseconds
◾user_id : each user has a unique id
◾action_id : 1=view, 2=click
◾domain : which domain the ad was served
◾campaign_id: identifies the campaign the ad was part of
减速机的预期输出为: campaignid,总观看次数,总点击次数 例如:
12,3,2 13,100,23 14,23,12
public class AdcampaignReducer extends Reducer<IntWritable, IntWritable, IntWritable, Text>
// Key/value : IntWritable/List of IntWritables for every campaign, we are getting all actions for that
// campaign as an iterable list. We are iterating through action_ids and calculating views and click
// Once we are done calculating, we write out the results. This is possible because all actions for a campaign are grouped and sent to one reducer.
//Text k= new Text();
public void reduce(IntWritable key, Iterable<IntWritable> results, Context context) throws IOException, InterruptedException
int campaign = key.get();
//k = key.get();
int clicks = 0;
int views = 0;
for(IntWritable i:results)
int action = i.get();
if (action ==1)
views = views+1;
else if (action == 2)
clicks = clicks + 1;
String statistics = "Total Clicks =" +clicks + "and Views =" + views;
context.write(new IntWritable(campaign), new Text(statistics));
public class AdcampaignMapper extends Mapper<LongWritable, Text, IntWritable, IntWritable> {
private long numRecords = 0;
public void map(LongWritable key, Text record, Context context) throws IOException, InterruptedException {
String[] tokens = record.toString().split(",");
if (tokens.length !=5)
System.out.println("*** invalid record : " + record);
String actionStr = tokens[2];
String campaignStr = tokens[4];
//System.out.println("during parseint"); //used to debug
System.out.println("actionStr =" + actionStr + "and campaign str = " + campaignStr);
int actionid = Integer.parseInt(actionStr.trim());
int campaignid = Integer.parseInt(campaignStr.trim());
//System.out.println("during intwritable"); //used to debug
IntWritable outputKeyFromMapper = new IntWritable(actionid);
IntWritable outputValueFromMapper = new IntWritable(campaignid);
context.write(outputKeyFromMapper, outputValueFromMapper);
catch(Exception e){
System.out.println("*** there is exception");
numRecords = numRecords+1;
public class Adcampaign {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: MaxClosePrice <input path> <output path>");
//reads the default configuration of cluster from the configuration xml files
// https://www.quora.com/What-is-the-use-of-a-configuration-class-and-object-in-Hadoop-MapReduce-code
Configuration conf = new Configuration();
//Initializing the job with the default configuration of the cluster
Job job = new Job(conf, "Adcampaign");
//first argument is job itself
//second argument is location of the input dataset
FileInputFormat.addInputPath(job, new Path(args[0]));
//first argument is the job itself
//second argument is the location of the output path
FileOutputFormat.setOutputPath(job, new Path(args[1]));
//Defining input Format class which is responsible to parse the dataset into a key value pair
//Configuring the input/output path from the filesystem into the job
// InputFormat is responsible for 3 main tasks.
// a. Validate inputs - meaning the dataset exists in the location specified.
// b. Split up the input files into logical input splits. Each input split will be assigned a mapper.
// c. Recordreader implementation to extract logical records
//Defining output Format class which is responsible to parse the final key-value output from MR framework to a text file into the hard disk
//OutputFomat does 2 mains things
// a. Validate output specifications. Like if the output directory already exists? If the directory exist, it will throw an error.
// b. Recordwriter implementation to write output files of the job
//Hadoop comes with several output format implemenations.
//Assigning the driver class name
//Defining the mapper class name
//Defining the Reducer class name
//setting the second argument as a path in a path variable
Path outputPath = new Path(args[1]);
//deleting the output path automatically from hdfs so that we don't have delete it explicitly
///exiting the job only if the flag value becomes false
System.exit(job.waitForCompletion(true) ? 0 : 1);
答案 0 :(得分:0)
答案 1 :(得分:0)
你的mapper和reducer看起来很好。 将以下行添加到您的Driver类并尝试:
job.setOutputKeyClass( IntWritable.class );
job.setOutputValueClass( Text.class );
答案 2 :(得分:0)
您希望根据campaign_id输出。所以Campaign_id shud是映射器代码的关键。然后在reducer代码中,您将检查它是视图还是单击。
String actionStr = tokens[2];
String campaignStr = tokens[4];
int actionid = Integer.parseInt(actionStr.trim());
int campaignid = Integer.parseInt(campaignStr.trim());
IntWritable outputKeyFromMapper = new IntWritable(actionid);
IntWritable outputValueFromMapper = new IntWritable(campaignid);
Here outputKeyFromMapper should be campaignid as the sorting will be done on campaignid.