Question

我想使用HBase批量加载API LoadIncrementalHFiles.doBulkLoad(new Path(), hTable)将我的map-reduce作业的输出插入到HBase表中。

我从我的mapper中发出KeyValue数据类型，然后使用HFileOutputFormat使用其默认的reducer来准备我的HFiles。

当我运行map-reduce作业时，它会在没有任何错误的情况下完成并创建outfile，但是，最后一步 - 将HFiles插入HBase不会发生。 map-reduce完成后，我收到以下错误：

13/09/08 03:39:51 WARN mapreduce.LoadIncrementalHFiles: Skipping non-directory hdfs://localhost:54310/user/xx.xx/output/_SUCCESS
13/09/08 03:39:51 WARN mapreduce.LoadIncrementalHFiles: Bulk load operation did not find any files to load in directory output/.  Does it contain files in subdirectories that correspond to column family names?

但是我可以看到输出目录包含：

1. _SUCCESS
2. _logs
3. _0/2aa96255f7f5446a8ea7f82aa2bd299e file (which contains my data)

我不知道为什么我的bulkloader没有从输出目录中选择文件。

以下是我的Map-Reduce驱动程序类的代码：

public static void main(String[] args) throws Exception{

    String inputFile = args[0];
    String tableName = args[1];
    String outFile = args[2];
    Path inputPath = new Path(inputFile);
    Path outPath = new Path(outFile);

    Configuration conf = new Configuration();
    FileSystem fs = FileSystem.get(conf);

    //set the configurations
    conf.set("mapred.job.tracker", "localhost:54311");

    //Input data to HTable using Map Reduce
    Job job = new Job(conf, "MapReduce - Word Frequency Count");
    job.setJarByClass(MapReduce.class);

    job.setInputFormatClass(TextInputFormat.class);

    FileInputFormat.addInputPath(job, inputPath);

    fs.delete(outPath);
    FileOutputFormat.setOutputPath(job, outPath);

    job.setMapperClass(MapReduce.MyMap.class);
    job.setMapOutputKeyClass(ImmutableBytesWritable.class);
    job.setMapOutputValueClass(KeyValue.class);

    HTable hTable = new HTable(conf, tableName.toUpperCase());

    // Auto configure partitioner and reducer
    HFileOutputFormat.configureIncrementalLoad(job, hTable);

    job.waitForCompletion(true);

    // Load generated HFiles into table
    LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
    loader.doBulkLoad(new Path(outFile), hTable);

}

我怎样才能弄清楚我在这里发生的错误事情，避免将数据插入HBase？

Answer 1

最后，我想出了为什么我的HFiles没有被转入HBase。以下是详细信息：

我的create语句ddl没有任何默认列名，所以我的猜测是Phoenix创建了默认列系列“_0”。我能够在我的HDFS / hbase目录中看到这个列族。

但是，当我使用HBase的LoadIncrementalHFiles API从我的输出目录中获取文件时，它并没有在我的情况下选择以col-family（“ 0”）命名的目录。我调试了LoadIncrementalHFiles API代码，发现它跳过了以“”开头的输出路径中的所有目录（例如“_logs”）。

我再次尝试了相同但现在通过指定一些列族，一切都很好。我可以使用Phoenix SQL查询数据。

无法使用mapreduce.LoadIncrementalHFiles将HFile加载到HBase中

1 个答案: