目前我正在评估Apache Crunch。我跟着一个简单的WordCount MapReduce job example: 之后我尝试将结果保存到独立的HBase中。 HBase正在运行(使用jps和HBase shell进行检查),如下所述:http://hbase.apache.org/book/quickstart.html
现在我采用写入HBase的例子:
Pipeline pipeline = new MRPipeline(WordCount.class,getConf());
PCollection<String> lines = pipeline.readTextFile(inputPath);
PTable<String,Long> counts = noStopWords.count();
pipeline.write(counts, new HBaseTarget("wordCountOutTable");
PipelineResult result = pipeline.done();
我得到一个异常:“异常:java.lang.illegalArgumentException:HBaseTarget只支持Put和Delete”
任何线索出了什么问题?
答案 0 :(得分:3)
PTable可能是PCollection,但HBaseTarget只能处理Put或Delete对象。因此,您必须将PTable转换为PCollection,其中集合的每个元素都是Put或Delete。看看Crunch-Examples这是做什么的。
示例转换可能如下所示:
public PCollection<Put> createPut(final PTable<String, String> counts) {
return counts.parallelDo("Convert to puts", new DoFn<Pair<String, String>, Put>() {
@Override
public void process(final Pair<String, String> input, final Emitter<Put> emitter) {
Put put;
// input.first is used as row key
put = new Put(Bytes.toBytes(input.first()));
// the value (input.second) is added with its family and qualifier
put.add(COLUMN_FAMILY_TARGET, COLUMN_QUALIFIER_TARGET_TEXT, Bytes.toBytes(input.second()));
emitter.emit(put);
}
}, Writables.writables(Put.class));
}