Hive执行错误

时间:2016-09-04 09:17:11

标签: hadoop avro apache-hive

我是avro和hive的新手,在学习的过程中,我感到有些困惑。使用

tblproperties('avro.schema.url'='somewhereinHDFS/categories.avsc')

如果我运行此create命令,如

create table categories (id Int , dep_Id Int , name String) 
stored as avrofile  
tblproperties('avro.schema.url'=
'hdfs://quickstart.cloudera/user/cloudera/data/retail_avro_avsc/categories.avsc')

但即使我提供包含完整架构的id Int, dep_Id Int文件,为什么我应该在上面的命令中使用avsc

create table categories stored as avrofile
tblproperties('avro/schema.url'=
'hdfs://quickstart.cloudera/user/cloudera/data/retail_avro_avsc/categories.avsc')
  

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. 
java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException 
Encountered AvroSerdeException determining schema. 
Returning signal schema to indicate problem: 
Neither avro.schema.literal nor avro.schema.url specified, 
can't determine table schema)

为什么即使avsc文件存在并且它已经包含架构,hive也需要指定架构?

2 个答案:

答案 0 :(得分:1)

你能尝试这样做吗?

CREATE TABLE categories
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  TBLPROPERTIES (
    'avro.schema.url'='http://schema.avsc');

此处有更多信息https://cwiki.apache.org/confluence/display/Hive/AvroSerDe

答案 1 :(得分:0)

根据给定的avro模式文件和avro数据文件创建外部配置单元表orders_sqoop

 hive> create external table if not exists orders_sqoop
        stored as avro
        location '/user/hive/warehouse/retail_stage.db/orders'
        tblproperties('avro.schema.url'='/user/hive/warehouse/retail_stage.db/orders_schema/orders.avsc');

以上create table命令成功执行并创建了orders_sqoop表。

验证以下表格结构:

hive> show create table orders_sqoop;
OK
CREATE EXTERNAL TABLE `orders_sqoop`(
  `order_id` int COMMENT '', 
  `order_date` bigint COMMENT '', 
  `order_customer_id` int COMMENT '', 
  `order_status` string COMMENT '')
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION
  'hdfs://quickstart.cloudera:8020/user/hive/warehouse/retail_stage.db/orders'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='false', 
  'avro.schema.url'='/user/hive/warehouse/retail_stage.db/orders_schema/orders.avsc', 
  'numFiles'='2', 
  'numRows'='-1', 
  'rawDataSize'='-1', 
  'totalSize'='660906', 
  'transient_lastDdlTime'='1563093902')
Time taken: 0.125 seconds, Fetched: 21 row(s)

上表已按预期创建。