我正在尝试运行hadoop减少作业,以将输出写入cassandra中的表。我的reduce作业和作业配置如下所示:
public static class RankReducer extends Reducer<IntWritable, WidgetHits, ByteBuffer, List<Mutation>> {
private static MultipleOutputs<ByteBuffer, List<Mutation>> output;
public void reduce(IntWritable key, Iterable<WidgetHits> values, Context context)
throws IOException, InterruptedException {
output = new MultipleOutputs<ByteBuffer, List<Mutation>>(context);
ArrayList<WidgetHits> ranking = new ArrayList<WidgetHits>();
for (WidgetHits val : values) {
for(int i = 0; i < 10; i++) {
if(i == ranking.size() || val.getHits() > ranking.get(i).getHits()) {
ranking.add(i, new WidgetHits(val.getWidget(), val.getHits()));
break;
}
}
}
for(int i = 0; i < ranking.size() && i < 10; i++) {
List<ByteBuffer> rankByteList = new ArrayList<ByteBuffer>();
rankByteList.add(ByteBufferUtil.bytes(i + 1));
ByteBuffer airportBytes = ByteBufferUtil.bytes(ranking.get(i).getWidget());
output.write(tableName, airportBytes, rankByteList);
}
}
private ByteBuffer bytes(String val) {
return ByteBufferUtil.bytes(val.toString());
}
}
作业配置:
Job rankJob = Job.getInstance(conf, "Widget Ranking Ranker");
rankJob.setJarByClass(WidgetRanking.class);
rankJob.setMapperClass(RankMapper.class);
rankJob.setReducerClass(RankReducer.class);
rankJob.setInputFormatClass(SequenceFileInputFormat.class);
rankJob.setMapOutputKeyClass(IntWritable.class);
rankJob.setMapOutputValueClass(WidgetHits.class);
rankJob.setOutputKeyClass(ByteBuffer.class);
rankJob.setOutputValueClass(List.class);
rankJob.setOutputFormatClass(CqlBulkOutputFormat.class);
FileInputFormat.addInputPath(rankJob, new Path("temp/" + outputCode + "/"));
ConfigHelper.setOutputRpcPort(rankJob.getConfiguration(), "9160");
ConfigHelper.setOutputInitialAddress(rankJob.getConfiguration(), "localhost");
ConfigHelper.setOutputColumnFamily(rankJob.getConfiguration(), "widgetspace", "widgetRanking");
ConfigHelper.setOutputPartitioner(rankJob.getConfiguration(), "Murmur3Partitioner");
上面的代码仍然没有经过良好的测试,并且可能包含错误。
我正在以伪分布式模式在一台机器上运行此程序,目的是稍后将其部署到真实集群中。 HDFS和Yarn按照here的指示激活。
运行时出现错误:
org.apache.cassandra.exceptions.ConfigurationException: Expecting URI in variable: [cassandra.config]. Found[cassandra.yaml]. Please prefix the file with [file:///] for local files and [file://<server>/] for remote files. If you are executing this from an external tool, it needs to set Config.setClientMode(true) to avoid loading configuration.
at org.apache.cassandra.config.YamlConfigurationLoader.getStorageConfigURL(YamlConfigurationLoader.java:80)
我尝试放入自己的代码中的一些显而易见的事情:
System.setProperty("cassandra.config", "file:///home/[user]/apache-cassandra-3.9/conf/cassandra.yaml");
Config.setClientMode(true);
rankJob.getConfiguration().set("cassandra.config", "file:///home/[user]/apache-cassandra-3.9/conf/cassandra.yaml");
但是这些似乎无能为力。仍然说“ Found [cassandra.yaml]”。
当前正在运行Hadoop 2.9.1和cassandra 3.9(对于在cassandra 3.11.3中无法加载的配置对象,我得到了一个nullpointerexception,因此基本上是相同的错误,但在输出中描述得不太清楚)
我是否需要在其他地方提供cassandra配置路径?