当数据存储为AVRO格式时,无法从Hive查询记录,返回" error_error ..."例外

时间:2016-09-26 09:41:16

标签: hive sqoop avro

我们已按照以下步骤进行操作,

  1. 将表从MySQL导入HDFS位置user/hive/warehouse/orders/,表格架构为

    mysql> describe orders;
    +-------------------+-------------+------+-----+---------+-------+
    | Field             | Type        | Null | Key | Default | Extra |
    +-------------------+-------------+------+-----+---------+-------+
    | order_id          | int(11)     | YES  |     | NULL    |       |
    | order_date        | varchar(30) | YES  |     | NULL    |       |
    | order_customer_id | int(11)     | YES  |     | NULL    |       |
    | order_items       | varchar(30) | YES  |     | NULL    |       |
    +-------------------+-------------+------+-----+---------+-------+
    
  2. 使用(1)中的相同数据在Hive中创建外部表。

    CREATE EXTERNAL TABLE orders
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
    STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
    LOCATION 'hdfs:///user/hive/warehouse/retail_stage.db/orders'
    TBLPROPERTIES ('avro.schema.url'='hdfs://host_name//tmp/sqoop-cloudera/compile/bb8e849c53ab9ceb0ddec7441115125d/orders.avsc');
    

    Sqoop命令:

     sqoop import \
      --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
      --username=root \
      --password=cloudera \
      --table orders \
      --target-dir /user/hive/warehouse/retail_stage.db/orders \
      --as-avrodatafile \
      --split-by order_id
    
  3. 描述格式化订单,返回错误,尝试了很多组合但失败了。

    hive> describe orders;
    OK
    error_error_error_error_error_error_error   string                  from deserializer   
    cannot_determine_schema string                  from deserializer   
    check                   string                  from deserializer   
    schema                  string                  from deserializer   
    url                     string                  from deserializer   
    and                     string                  from deserializer   
    literal                 string                  from deserializer   
    Time taken: 1.15 seconds, Fetched: 7 row(s)
    
  4. 同样适用于--as-textfile,在--as-avrodatafile的情况下抛出错误。

    提到了一些堆栈溢出但无法解决。有什么想法吗?

1 个答案:

答案 0 :(得分:0)

我认为应该检查TBLPROPERTIES中对avro模式文件的引用。

以下解决?

hdfs dfs -cat hdfs://host_name//tmp/sqoop-cloudera/compile/bb8e849c53ab9ceb0ddec7441115125d/orders.avsc

我能够创建确切的场景并从hive表中选择。

hive> CREATE EXTERNAL TABLE sqoop_test
    > COMMENT "A table backed by Avro data with the Avro schema stored in HDFS"
    > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'    
    > STORED AS 
    >    INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'    
>    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> LOCATION '/user/cloudera/categories/'    
> TBLPROPERTIES 
>  ('avro.schema.url'='hdfs:///user/cloudera/categories.avsc')
> ; 

行 所用时间:1.471秒

 hive> select * from sqoop_test;
 OK
 1  2   Football
 2  2   Soccer
 3  2   Baseball & Softball