对于Bucketed Hive ORC表,sqoop导入失败

时间:2017-12-22 09:28:44

标签: hadoop hive sqoop orc

我使用以下DDL在Hive中创建了ORC Bucketed表:

create table Employee( EmpID STRING , EmpName STRING) 
clustered by (EmpID) into 10 buckets 
stored as orc 
TBLPROPERTIES('transactional'='true');

然后运行Sqoop Import:

sqoop import --verbose \
--connect 'RDBMS_JDBC_URL' \
--driver JDBC_DRIVER \
--table Employee  \
--null-string '\\N' \
--null-non-string '\\N' \
--username USER \
--password PASSWPRD \
--hcatalog-database hive_test_trans \
--hcatalog-table Employee  \
--hcatalog-storage-stanza \
"storedas orc" -m 1

哪个失败,出现以下异常:

 22/12/17 03:28:59 ERROR
 tool.ImportTool: Encountered IOException running import job:
 org.apache.hive.hcatalog.common.HCatException : 2016 : **Error
 operation not supported : Store into a partition with bucket
 definition from Pig/Mapreduce is not supported**
          at org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.setOutput(HCatOutputFormat.java:109)
          at org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.setOutput(HCatOutputFormat.java:70)
          at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureHCat(SqoopHCatUtilities.java:339)
          at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureImportOutputFormat(SqoopHCatUtilities.java:753)
          at org.apache.sqoop.mapreduce.ImportJobBase.configureOutputFormat(ImportJobBase.java:98)
          at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:240)
          at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:665)
          at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
          at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:601)

我们可以通过创建临时表来解决这个问题,但我不想再添加一个步骤。

我可以直接将数据从Oracle导入到ORC Bucketed表而无需使用临时表吗?

1 个答案:

答案 0 :(得分:0)

Hive仍然不支持将数据导入事务性Hive表,您必须有一个解决方法。

Here is the link用于获取修复的开放式JIRA票证。在此之前,您必须执行一些中间操作才能将数据写入Hive。您在问题中提到的临时表选项是一个很好的选择。