我有一个Adcampaign驱动程序,映射器和减速器类。前两个课程运行得很好。 reducer类也运行正常,但结果不正确。这是我从互联网上下载到实践mapreduce程序的示例项目。
该计划的简要说明: 问题陈述:
对于这篇文章,让我们假装我们正在运营一家在线广告公司。我们为客户(如百事可乐,索尼)开展广告活动,广告在新闻网站(CNN,Fox)和社交媒体网站(Facebook)等热门网站上展示。为了跟踪广告活动的效果,我们会跟踪我们投放的广告和用户点击的广告。
方案
以下是事件的顺序: 1.我们向用户提供广告 2.如果广告出现在用户浏览器上,则用户也会看到广告。我们将此事件视为VIEWED_EVENT 3.如果用户点击广告,我们会将此事件跟踪为CLICKED_EVENT
示例数据:
293868800864,319248,1,flickr.com,12
1293868801728,625828,1,npr.org,19
1293868802592,522177,2,wikipedia.org,16
1293868803456,535052,2,cnn.com,20
1293868804320,287430,2,sfgate.com,2
1293868805184,616809,2,sfgate.com,1
1293868806048,704032,1,nytimes.com,7
1293868806912,631825,2,amazon.com,11
1293868807776,610228,2,npr.org,6
1293868808640,454108,2,twitter.com,18
Input Log files format and description:
Log Files: The log files are in the following format:
times- tamp, user_id, view/click, domain, campaign_id.
E.g: 1262332801728, 899523, 1, npr.org, 19
◾timestamp : unix time stamp in milliseconds
◾user_id : each user has a unique id
◾action_id : 1=view, 2=click
◾domain : which domain the ad was served
◾campaign_id: identifies the campaign the ad was part of
减速机的预期输出为: campaignid,总观看次数,总点击次数 例如:
12,3,2 13,100,23 14,23,12
我查看了Mapper的日志。输出很好。但是Reducer的最终输出并不好。
减速器类:
public class AdcampaignReducer extends Reducer<IntWritable, IntWritable, IntWritable, Text>
{
// Key/value : IntWritable/List of IntWritables for every campaign, we are getting all actions for that
// campaign as an iterable list. We are iterating through action_ids and calculating views and click
// Once we are done calculating, we write out the results. This is possible because all actions for a campaign are grouped and sent to one reducer.
//Text k= new Text();
public void reduce(IntWritable key, Iterable<IntWritable> results, Context context) throws IOException, InterruptedException
{
int campaign = key.get();
//k = key.get();
int clicks = 0;
int views = 0;
for(IntWritable i:results)
{
int action = i.get();
if (action ==1)
views = views+1;
else if (action == 2)
clicks = clicks + 1;
}
String statistics = "Total Clicks =" +clicks + "and Views =" + views;
context.write(new IntWritable(campaign), new Text(statistics));
}
}
Mapper类:
public class AdcampaignMapper extends Mapper<LongWritable, Text, IntWritable, IntWritable> {
private long numRecords = 0;
@Override
public void map(LongWritable key, Text record, Context context) throws IOException, InterruptedException {
String[] tokens = record.toString().split(",");
if (tokens.length !=5)
{
System.out.println("*** invalid record : " + record);
}
String actionStr = tokens[2];
String campaignStr = tokens[4];
try{
//System.out.println("during parseint"); //used to debug
System.out.println("actionStr =" + actionStr + "and campaign str = " + campaignStr);
int actionid = Integer.parseInt(actionStr.trim());
int campaignid = Integer.parseInt(campaignStr.trim());
//System.out.println("during intwritable"); //used to debug
IntWritable outputKeyFromMapper = new IntWritable(actionid);
IntWritable outputValueFromMapper = new IntWritable(campaignid);
context.write(outputKeyFromMapper, outputValueFromMapper);
}
catch(Exception e){
System.out.println("*** there is exception");
e.printStackTrace();
}
numRecords = numRecords+1;
}
}
驱动程序:
public class Adcampaign {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: MaxClosePrice <input path> <output path>");
System.exit(-1);
}
//reads the default configuration of cluster from the configuration xml files
// https://www.quora.com/What-is-the-use-of-a-configuration-class-and-object-in-Hadoop-MapReduce-code
Configuration conf = new Configuration();
//Initializing the job with the default configuration of the cluster
Job job = new Job(conf, "Adcampaign");
//first argument is job itself
//second argument is location of the input dataset
FileInputFormat.addInputPath(job, new Path(args[0]));
//first argument is the job itself
//second argument is the location of the output path
FileOutputFormat.setOutputPath(job, new Path(args[1]));
//Defining input Format class which is responsible to parse the dataset into a key value pair
//Configuring the input/output path from the filesystem into the job
// InputFormat is responsible for 3 main tasks.
// a. Validate inputs - meaning the dataset exists in the location specified.
// b. Split up the input files into logical input splits. Each input split will be assigned a mapper.
// c. Recordreader implementation to extract logical records
job.setInputFormatClass(TextInputFormat.class);
//Defining output Format class which is responsible to parse the final key-value output from MR framework to a text file into the hard disk
//OutputFomat does 2 mains things
// a. Validate output specifications. Like if the output directory already exists? If the directory exist, it will throw an error.
// b. Recordwriter implementation to write output files of the job
//Hadoop comes with several output format implemenations.
job.setOutputFormatClass(TextOutputFormat.class);
//Assigning the driver class name
job.setJarByClass(Adcampaign.class);
//Defining the mapper class name
job.setMapperClass(AdcampaignMapper.class);
//Defining the Reducer class name
job.setReducerClass(AdcampaignReducer.class);
//setting the second argument as a path in a path variable
Path outputPath = new Path(args[1]);
//deleting the output path automatically from hdfs so that we don't have delete it explicitly
outputPath.getFileSystem(conf).delete(outputPath);
job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(IntWritable.class);
///exiting the job only if the flag value becomes false
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
答案 0 :(得分:0)
答案 1 :(得分:0)
你的mapper和reducer看起来很好。 将以下行添加到您的Driver类并尝试:
job.setOutputKeyClass( IntWritable.class );
job.setOutputValueClass( Text.class );
答案 2 :(得分:0)
您希望根据campaign_id输出。所以Campaign_id shud是映射器代码的关键。然后在reducer代码中,您将检查它是视图还是单击。
String actionStr = tokens[2];
String campaignStr = tokens[4];
int actionid = Integer.parseInt(actionStr.trim());
int campaignid = Integer.parseInt(campaignStr.trim());
IntWritable outputKeyFromMapper = new IntWritable(actionid);
IntWritable outputValueFromMapper = new IntWritable(campaignid);
Here outputKeyFromMapper should be campaignid as the sorting will be done on campaignid.
请让我知道它是否有用。