我在编写前10(密钥,值)对输出的reducer代码时遇到了困难。
我当前的输出格式为((年,市场),总金额)。我想要的是每年的前10名总金额。我目前的代码是每年为每个市场输出每笔金额。
任何建议都将不胜感激!
映射器:
public class FundingMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private Text Year = new Text();
private Text Market = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
CSVReader reader = new CSVReader(new StringReader(line));
String[] array = reader.readNext();
reader.close();
Year.set(array[14]);
Market.set(array[3]);
String amountString = array[15].replaceAll("[^0-9]","");
int amount = 0;
try {
amount = Integer.parseInt(amountString);
}
catch(NumberFormatException nfe) {
return;
}
IntWritable intW = new IntWritable(amount);
String S = new StringBuilder().append(Year + " ").append(Market + " ").toString();
context.write(new Text(S), intW);
}
}
减速机:
public class FundingReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException,
InterruptedException {
int sum = 0;
for(IntWritable value : values) {
sum += value.get();
}
context.write(key, new IntWritable(sum));
}
}
数据样本:
/organization/contravir-pharmaceuticals ContraVir Pharmaceuticals |Biotechnology| Biotechnology USA NY New York City New York /funding-round/9a7cc724deba554585e2b79c14605866 post_ipo_equity 8/22/14 2014-08 2014-Q3 2014 4,742,648
/organization/contravir-pharmaceuticals ContraVir Pharmaceuticals |Biotechnology| Biotechnology USA NY New York City New York /funding-round/04a7ec54417a0f9a6c99cf8db2eac819 venture A 10/15/14 2014-10 2014-Q4 2014 9,000,000
/organization/contravir-pharmaceuticals ContraVir Pharmaceuticals |Biotechnology| Biotechnology USA NY New York City New York /funding-round/328384053df3a992ca6d5da55ca0420e venture 2/14/14 2014-02 2014-Q1 2014 3,225,000
/organization/contrib-com contrib.com |Entrepreneur|Technology|Domains|Education|Social Media| Social Media USA FL Palm Beaches Delray Beach /funding-round/fea112ed22657c1456820aa26af3ab17 seed 6/17/14 2014-06 2014-Q2 2014 300,000
输出样本:
2014 Biotechnology 16967648
2014 Social Media 300000
答案 0 :(得分:0)
您需要在地图输出中将关键字作为年份。这将确保您在减速器中一次获得每年的值。之后您可以只为输出过滤掉10个值。请查看下面的内容。
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
CSVReader reader = new CSVReader(new StringReader(line));
String[] array = reader.readNext();
reader.close();
Year.set(array[14]);
Market.set(array[3]);
String amountString = array[15].replaceAll("[^0-9]","");
int amount = 0;
try {
amount = Integer.parseInt(amountString);
}
catch(NumberFormatException nfe) {
return;
}
IntWritable intW = new IntWritable(amount);
context.write(new Intwritable(Year), new Text(amount +" "+ market));
}
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException,
InterruptedException {
int count= 0;
int amount =0;
string market = "";
for(IntWritable value : values) {
market = value.toString().split(" ")[1];
amount = Integer.parseInt(value.toString.split(" ")[0])
if(count < 10){
count ++;
context.write(key, value);
}
else
break;
}
// context.write(key, new IntWritable(sum));
}