我正在尝试分析零售商店数据,我希望按城市解决销售细分,这是我的数据
Date Time City Product-Cat Sale-Value Payment-Mode
2012-01-01 09:20 Fort Worth Women's Clothing 153.57 Visa
2012-01-01 09:00 San Jose Mens Clothing 214.05 Rupee
2012-01-01 09:00 San Diego Music 76.43 Amex
2012-01-01 09:00 New York Cameras 45.76 Visa
现在我想计算所有商店的产品类别的销售细分
这是Mapper和reducer以及主类
public class RetailDataAnalysis {
public static class RetailDataAnalysisMapper extends Mapper<Text,Text,Text,Text>{
// when trying with LongWritable Key
public void map(LongWritable key,Text Value,Context context) throws IOException, InterruptedException{
String analyser [] = Value.toString().split(",");
Text productCategory = new Text(analyser[3]);
Text salesPrice = new Text(analyser[4]);
context.write(productCategory, salesPrice);
}
// When trying with Text key
public void map(Text key,Text Value,Context context) throws IOException, InterruptedException{
String analyser [] = Value.toString().split(",");
Text productCategory = new Text(analyser[3]);
Text salesPrice = new Text(analyser[4]);
context.write(productCategory, salesPrice);
}
}
public static class RetailDataAnalysisReducer extends Reducer<Text,Text,Text,Text>{
protected void reduce(Text key,Iterable<Text> values,Context context)throws IOException, InterruptedException{
String csv ="";
for(Text value:values){
if(csv.length()>0){
csv+= ",";
}
csv+=value.toString();
}
context.write(key, new Text(csv));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String [] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();
if(otherArgs.length<2){
System.out.println("Usage Retail Data ");
System.exit(2);
}
Job job= new Job(conf,"Retail Data Analysis");
job.setJarByClass(RetailDataAnalysis.class);
job.setMapperClass(RetailDataAnalysisMapper.class);
job.setCombinerClass(RetailDataAnalysisReducer.class);
job.setReducerClass(RetailDataAnalysisReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
for(int i=0;i<otherArgs.length-1;++i){
FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
}
FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length-1]));
System.exit(job.waitForCompletion(true)?0:1);
}
}
我得到的例外是使用LongWritable Key,
18/04/11 09:15:40 INFO mapreduce.Job: Task Id : attempt_1523355254827_0008_m_000000_2, Status : FAILED
Error: java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1069)
尝试使用文本键时我遇到异常
Error: java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1069)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:712)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)
请帮我解决这个问题,我对hadoop很新。
答案 0 :(得分:0)
您可能需要不同的输入格式类。默认使用的是TextInputFormat,它会逐行拆分文件,并将行号设为LongWritable
,行号为Text
。
您可以这样指定输入格式类:
job.setInputFormatClass(TextInputFormat.class);
在您的情况下,如果您不需要密钥,只需要值,则可以使用LongWritable
作为密钥:
public static class RetailDataAnalysisMapper extends Mapper<LongWritable, Text, Text, Text> {
public void map(LongWritable key, Text Value, Context context) throws IOException, InterruptedException {
//...
}
}
修改强>
以下是使用LongWritable
作为关键词的整个代码:
public class RetailDataAnalysis {
public static class RetailDataAnalysisMapper extends Mapper<LongWritable, Text, Text, Text> {
public void map(LongWritable key, Text Value, Context context) throws IOException, InterruptedException {
String analyser[] = Value.toString().split(",");
Text productCategory = new Text(analyser[3]);
Text salesPrice = new Text(analyser[4]);
context.write(productCategory, salesPrice);
}
}
public static class RetailDataAnalysisReducer extends Reducer<Text, Text, Text, Text> {
protected void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
String csv = "";
for (Text value : values) {
if (csv.length() > 0) {
csv += ",";
}
csv += value.toString();
}
context.write(key, new Text(csv));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length < 2) {
System.out.println("Usage Retail Data ");
System.exit(2);
}
Job job = new Job(conf, "Retail Data Analysis");
job.setJarByClass(RetailDataAnalysis.class);
job.setMapperClass(RetailDataAnalysisMapper.class);
job.setCombinerClass(RetailDataAnalysisReducer.class);
job.setReducerClass(RetailDataAnalysisReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
for (int i = 0; i < otherArgs.length - 1; ++i) {
FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
}
FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length - 1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
另外,如果您按,
拆分数据,那么您的数据应该是csv,如下所示:
2012-01-01 09:20,Fort Worth,Women's Clothing,153.57,Visa
2012-01-01 09:00,San Jose,Mens Clothing,214.05,Rupee
2012-01-01 09:00,San Diego,Music,76.43,Amex
2012-01-01 09:00,New York,Cameras,5.76,Visa
您在问题中指定的空格不是空格。
答案 1 :(得分:0)
当您使用Map Reduce读取文件时,文件输入格式(默认值)读取整行并以格式将其发送到映射器,因此映射器的输入变为: -
public static class RetailDataAnalysisMapper extends Mapper<LongWritable,Text,Text,Text>
如果您需要阅读
public static class RetailDataAnalysisMapper extends Mapper<Text,Text,Text,Text>
您需要更改文件输入格式并使用自定义文件输入格式以及自定义记录阅读器。 然后,您需要在驱动程序代码中添加以下行。
job.setInputFormatClass("your custom input format".class);
Hadoop以形式理解一切
因此,当您读取文件时,偏移量将成为LongWritable键,读取的值将成为值。
因此,您需要使用Mapper<LongWritable,Text, <anything>,<anything> >