使用Pig将csv导入HBase

时间:2015-06-11 20:42:35

标签: hbase apache-pig bigdata

我想使用Pig

将以下样本数据(制表符分隔)导入HBase
1       2       3
4       5       6
7       8       9

并使用以下命令来实现相同目的。

grunt> A = LOAD '/idn/home/mvenk9/Test' USING PigStorage('\t') as (id:int, id1:int, id2:int);

 STORE A INTO 'hbase://mydata' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('mycf:intdata');

在执行第二行时,我得到以下异常,我不知道为什么这不起作用,并且对所有这些工具都不熟悉..

2015-06-11 13:34:37,125 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2015-06-11 13:34:37,126 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
2015-06-11 13:34:37,442 [main] INFO  org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper - The identifier of this process is 29965@lppbd0030.gso.aexp.com
2015-06-11 13:34:37,554 [main] INFO  org.apache.hadoop.hbase.mapreduce.TableOutputFormat - Created table instance for mydata
2015-06-11 13:34:37,557 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2015-06-11 13:34:37,559 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2015-06-11 13:34:37,559 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2015-06-11 13:34:37,561 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2015-06-11 13:34:37,562 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2015-06-11 13:34:37,563 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job2235913801538823778.jar
2015-06-11 13:34:40,868 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job2235913801538823778.jar created
2015-06-11 13:34:40,882 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2015-06-11 13:34:40,885 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2017: Internal error creating job configuration.
Details at logfile: /idn/home/mvenk9/pig_1434054848332.log

并从日志文件中

Pig Stack Trace
---------------
ERROR 2017: Internal error creating job configuration.

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias A
        at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1635)
        at org.apache.pig.PigServer.registerQuery(PigServer.java:575)
        at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1093)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:501)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
        at org.apache.pig.Main.run(Main.java:541)
        at org.apache.pig.Main.main(Main.java:156)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
Caused by: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: ERROR 2017: Internal error creating job configuration.
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:861)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:296)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:192)
        at org.apache.pig.PigServer.launchPlan(PigServer.java:1322)
        at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1307)
        at org.apache.pig.PigServer.execute(PigServer.java:1297)
        at org.apache.pig.PigServer.access$400(PigServer.java:122)
        at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1630)
        ... 13 more
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hbase://mydata_logs
        at org.apache.hadoop.fs.Path.initialize(Path.java:155)
        at org.apache.hadoop.fs.Path.<init>(Path.java:74)
        at org.apache.hadoop.fs.Path.<init>(Path.java:48)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:613)
        ... 20 more
Caused by: java.net.URISyntaxException: Relative path in absolute URI: hbase://mydata_logs
        at java.net.URI.checkPath(URI.java:1804)
        at java.net.URI.<init>(URI.java:752)
        at org.apache.hadoop.fs.Path.initialize(Path.java:152)
        ... 23 more
================================================================================

非常感谢任何帮助。

提前谢谢。

1 个答案:

答案 0 :(得分:1)

将hbase中的各个列名称作为参数添加到HBaseStorage。你只给了一个单元格mycf:intdata。查看herehere示例