我是avro和hive的新手,在学习的过程中,我感到有些困惑。使用
tblproperties('avro.schema.url'='somewhereinHDFS/categories.avsc')
。
如果我运行此create
命令,如
create table categories (id Int , dep_Id Int , name String)
stored as avrofile
tblproperties('avro.schema.url'=
'hdfs://quickstart.cloudera/user/cloudera/data/retail_avro_avsc/categories.avsc')
但即使我提供包含完整架构的id Int, dep_Id Int
文件,为什么我应该在上面的命令中使用avsc
。
create table categories stored as avrofile
tblproperties('avro/schema.url'=
'hdfs://quickstart.cloudera/user/cloudera/data/retail_avro_avsc/categories.avsc')
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.
java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException
Encountered AvroSerdeException determining schema.
Returning signal schema to indicate problem:
Neither avro.schema.literal nor avro.schema.url specified,
can't determine table schema)
为什么即使avsc
文件存在并且它已经包含架构,hive也需要指定架构?
答案 0 :(得分:1)
你能尝试这样做吗?
CREATE TABLE categories
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES (
'avro.schema.url'='http://schema.avsc');
此处有更多信息https://cwiki.apache.org/confluence/display/Hive/AvroSerDe
答案 1 :(得分:0)
根据给定的avro模式文件和avro数据文件创建外部配置单元表orders_sqoop
:
hive> create external table if not exists orders_sqoop
stored as avro
location '/user/hive/warehouse/retail_stage.db/orders'
tblproperties('avro.schema.url'='/user/hive/warehouse/retail_stage.db/orders_schema/orders.avsc');
以上create table
命令成功执行并创建了orders_sqoop
表。
验证以下表格结构:
hive> show create table orders_sqoop;
OK
CREATE EXTERNAL TABLE `orders_sqoop`(
`order_id` int COMMENT '',
`order_date` bigint COMMENT '',
`order_customer_id` int COMMENT '',
`order_status` string COMMENT '')
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION
'hdfs://quickstart.cloudera:8020/user/hive/warehouse/retail_stage.db/orders'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='false',
'avro.schema.url'='/user/hive/warehouse/retail_stage.db/orders_schema/orders.avsc',
'numFiles'='2',
'numRows'='-1',
'rawDataSize'='-1',
'totalSize'='660906',
'transient_lastDdlTime'='1563093902')
Time taken: 0.125 seconds, Fetched: 21 row(s)
上表已按预期创建。