Question

我正在尝试运行hadoop减少作业，以将输出写入cassandra中的表。我的reduce作业和作业配置如下所示：

public static class RankReducer extends Reducer<IntWritable, WidgetHits, ByteBuffer, List<Mutation>> {
    private static MultipleOutputs<ByteBuffer, List<Mutation>> output;

    public void reduce(IntWritable key, Iterable<WidgetHits> values, Context context)
            throws IOException, InterruptedException {
        output = new MultipleOutputs<ByteBuffer, List<Mutation>>(context);

        ArrayList<WidgetHits> ranking = new ArrayList<WidgetHits>();
        for (WidgetHits val : values) {
            for(int i = 0; i < 10; i++) {
                if(i == ranking.size() || val.getHits() > ranking.get(i).getHits()) {
                    ranking.add(i, new WidgetHits(val.getWidget(), val.getHits()));
                    break;
                }
            }
        }

        for(int i = 0; i < ranking.size() && i < 10; i++) {
            List<ByteBuffer> rankByteList = new ArrayList<ByteBuffer>();
            rankByteList.add(ByteBufferUtil.bytes(i + 1));

            ByteBuffer airportBytes = ByteBufferUtil.bytes(ranking.get(i).getWidget());

            output.write(tableName, airportBytes, rankByteList);
        }
    }

    private ByteBuffer bytes(String val) {
        return ByteBufferUtil.bytes(val.toString());
    }
}

作业配置：

Job rankJob = Job.getInstance(conf, "Widget Ranking Ranker");
    rankJob.setJarByClass(WidgetRanking.class);
    rankJob.setMapperClass(RankMapper.class);
    rankJob.setReducerClass(RankReducer.class);
    rankJob.setInputFormatClass(SequenceFileInputFormat.class);
    rankJob.setMapOutputKeyClass(IntWritable.class);
    rankJob.setMapOutputValueClass(WidgetHits.class);
    rankJob.setOutputKeyClass(ByteBuffer.class);
    rankJob.setOutputValueClass(List.class);
    rankJob.setOutputFormatClass(CqlBulkOutputFormat.class);
    FileInputFormat.addInputPath(rankJob, new Path("temp/" + outputCode + "/"));

    ConfigHelper.setOutputRpcPort(rankJob.getConfiguration(), "9160");

    ConfigHelper.setOutputInitialAddress(rankJob.getConfiguration(), "localhost");

    ConfigHelper.setOutputColumnFamily(rankJob.getConfiguration(), "widgetspace", "widgetRanking");
    ConfigHelper.setOutputPartitioner(rankJob.getConfiguration(), "Murmur3Partitioner");

上面的代码仍然没有经过良好的测试，并且可能包含错误。

我正在以伪分布式模式在一台机器上运行此程序，目的是稍后将其部署到真实集群中。 HDFS和Yarn按照here的指示激活。

运行时出现错误：

org.apache.cassandra.exceptions.ConfigurationException: Expecting URI in variable: [cassandra.config]. Found[cassandra.yaml]. Please prefix the file with [file:///] for local files and [file://<server>/] for remote files. If you are executing this from an external tool, it needs to set Config.setClientMode(true) to avoid loading configuration.
at org.apache.cassandra.config.YamlConfigurationLoader.getStorageConfigURL(YamlConfigurationLoader.java:80)

我尝试放入自己的代码中的一些显而易见的事情：

System.setProperty("cassandra.config", "file:///home/[user]/apache-cassandra-3.9/conf/cassandra.yaml");

Config.setClientMode(true);

rankJob.getConfiguration().set("cassandra.config", "file:///home/[user]/apache-cassandra-3.9/conf/cassandra.yaml");

但是这些似乎无能为力。仍然说“ Found [cassandra.yaml]”。

当前正在运行Hadoop 2.9.1和cassandra 3.9（对于在cassandra 3.11.3中无法加载的配置对象，我得到了一个nullpointerexception，因此基本上是相同的错误，但在输出中描述得不太清楚）

我是否需要在其他地方提供cassandra配置路径？

YamlConfigurationLoader无法在减少作业中加载cassandra配置

0 个答案: