如何调整从(非EMR)Hive到S3的数据上传?

时间:2016-06-15 17:41:36

标签: hadoop amazon-s3 hive

我想将裸机hadoop集群上的hive表中的数据复制到Amazon S3存储桶。

我知道我可以这样做:

hive> create external table my_table
> (
> `column1` string,
> `column2` string,
  ....
> `columnX` string)
> row format delimited fields terminated by ','
> lines terminated by '\n'
> stored as textfile
> location 's3n://my_bucket/my_folder_path/';

hive> insert into table my_table select * from source_db.source_table;

它适用于小型数据集。但是如果我用更大的数据集来尝试它,我会得到如下所示的堆栈跟踪错误。

我正在寻找一些帮助,帮助我调整这个过程或其他选项。

提前致谢。

Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row
        at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:52)
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163)
        ... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: n must be positive
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:577)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:675)
        at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:102)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
        at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:138)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
        at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:117)
        at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:167)
        at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
        ... 9 more

1 个答案:

答案 0 :(得分:0)

Hive清楚地告诉你这个问题。看看这两行:

  • 处理行时的Hive运行时错误
  • IllegalArgumentException:n必须为正数 因此,这不是小数据集与大数据集的问题。相反,您的大型数据集有一些行,Hive无法处理这些行。

但是,很难确定您发布的信息的确切问题。我建议你将大数据集分解成更小的块,并尝试缩小问题范围。