我们已按照以下步骤进行操作,
将表从MySQL导入HDFS位置user/hive/warehouse/orders/
,表格架构为
mysql> describe orders;
+-------------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+-------------+------+-----+---------+-------+
| order_id | int(11) | YES | | NULL | |
| order_date | varchar(30) | YES | | NULL | |
| order_customer_id | int(11) | YES | | NULL | |
| order_items | varchar(30) | YES | | NULL | |
+-------------------+-------------+------+-----+---------+-------+
使用(1)中的相同数据在Hive中创建外部表。
CREATE EXTERNAL TABLE orders
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION 'hdfs:///user/hive/warehouse/retail_stage.db/orders'
TBLPROPERTIES ('avro.schema.url'='hdfs://host_name//tmp/sqoop-cloudera/compile/bb8e849c53ab9ceb0ddec7441115125d/orders.avsc');
Sqoop命令:
sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username=root \
--password=cloudera \
--table orders \
--target-dir /user/hive/warehouse/retail_stage.db/orders \
--as-avrodatafile \
--split-by order_id
描述格式化订单,返回错误,尝试了很多组合但失败了。
hive> describe orders;
OK
error_error_error_error_error_error_error string from deserializer
cannot_determine_schema string from deserializer
check string from deserializer
schema string from deserializer
url string from deserializer
and string from deserializer
literal string from deserializer
Time taken: 1.15 seconds, Fetched: 7 row(s)
同样适用于--as-textfile
,在--as-avrodatafile
的情况下抛出错误。
提到了一些堆栈溢出但无法解决。有什么想法吗?
答案 0 :(得分:0)
我认为应该检查TBLPROPERTIES中对avro模式文件的引用。
以下解决?
hdfs dfs -cat hdfs://host_name//tmp/sqoop-cloudera/compile/bb8e849c53ab9ceb0ddec7441115125d/orders.avsc
我能够创建确切的场景并从hive表中选择。
hive> CREATE EXTERNAL TABLE sqoop_test
> COMMENT "A table backed by Avro data with the Avro schema stored in HDFS"
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> STORED AS
> INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> LOCATION '/user/cloudera/categories/'
> TBLPROPERTIES
> ('avro.schema.url'='hdfs:///user/cloudera/categories.avsc')
> ;
行 所用时间:1.471秒
hive> select * from sqoop_test;
OK
1 2 Football
2 2 Soccer
3 2 Baseball & Softball