我正在尝试研究如何在Hive中爆炸复杂类型。我有以下Avro文件,我想用于我的测试,并在其上构建一个Hive外部表。
这是我的测试数据。
{"order_id":123456,"customer_id":987654,"total":305,"order_details":[{"quantity":5,"total":55,"product_detail":{"product_id":1000,"product_name":"Hugo Boss XY","product_description": {"string": "Hugo Xy Men 100 ml"}, "product_status": "AVAILABLE", "product_category":["fragrance","perfume"],"price":10.35,"product_hash":"XY123"}},{"quantity":5,"total":250,"product_detail":{"product_id":2000,"product_name":"Cherokee Polo T Shirt","product_description": {"string": "Cherokee Medium Blue Polo T Shirt"}, "product_status": "AVAILABLE", "product_category":["T-shirts","V-Neck","Cotton", "Medium"],"price":50.00,"product_hash":"XY789"}}]}
{"order_id":789012,"customer_id":4567324,"total":220,"order_details":[{"quantity":10,"total":120,"product_detail":{"product_id":1001,"product_name":"Hugo Men Red","product_description": {"string": "Hugo Men Red 150 ml"}, "product_status": "ONLY_FEW_LEFT", "product_category":["fragrance","perfume"],"price":12.99,"product_hash":"XY456"}},{"quantity":10,"total":100,"product_detail":{"product_id":2001,"product_name":"Ruggers Smart","product_description": {"string": "Ruggers Smart White Small Polo T Shirt"}, "product_status": "ONLY_FEW_LEFT", "product_category":["T-shirts","Round-Neck","Woolen", "Small"],"price":9.99,"product_hash":"XY987"}}]}
Avro架构
{
"namespace":"com.treselle.db.model",
"type":"record",
"doc":"This Schema describes about Order",
"name":"Order",
"fields":[
{"name":"order_id","type": "long"},
{"name":"customer_id","type": "long"},
{"name":"total","type": "float"},
{"name":"order_details","type":{
"type":"array",
"items": {
"namespace":"com.treselle.db.model",
"name":"OrderDetail",
"type":"record",
"fields": [
{"name":"quantity","type": "int"},
{"name":"total","type": "float"},
{"name":"product_detail","type":{
"namespace":"com.treselle.db.model",
"type":"record",
"name":"Product",
"fields":[
{"name":"product_id","type": "long"},
{"name":"product_name","type": "string","doc":"This is the name of the product"},
{"name":"product_description","type": ["string", "null"], "default": ""},
{"name":"product_status","type": {"name":"product_status", "type": "enum", "symbols": ["AVAILABLE", "OUT_OF_STOCK", "ONLY_FEW_LEFT"]}, "default":"AVAILABLE"},
{"name":"product_category","type":{"type": "array", "items": "string"}, "doc": "This contains array of categories"},
{"name":"price","type": "float"},
{"name": "product_hash", "type": {"type": "fixed", "name": "product_hash", "size": 5}}
]
}
}
]
}
}
}
]
}
My Hive DDL
CREATE EXTERNAL TABLE orders (
order_id bigint,
customer_id bigint,
total float,
order_items array<
struct<
quantity:int,
total:float,
product_detail:struct<
product_id:bigint,
product_name:string,
product_description:string,
product_status:string,
product_caretogy:array<string>,
price:float,
product_hash:binary
>
>
>
)
STORED AS AVRO
LOCATION '/user/hive/test/orders';
查询
SELECT order_id, customer_id FROM orders;
这样可以正常工作,并按预期返回2行的结果。
但是当我尝试使用侧视爆炸时,我遇到了问题。
SELECT
order_id,
customer_id,
ord_dets.quantity as line_qty,
ord_dets.total as line_total
FROM
orders
LATERAL VIEW explode(order_items) exploded_table as ord_dets;
此查询运行正常,但不会产生任何结果。
关于这里错误的任何指示?
答案 0 :(得分:0)
原因是您的架构中定义了[STAThread]
static void Main(string[] args)
{
WpfApp.App app = new WpfApp.App();
app.InitializeComponent();
app.Run();
}
,但在数据和avro架构中,该字段称为order_items
。 Hive会查找order_details
,并认为它是一个不存在的字段,默认为null。
答案 1 :(得分:0)
感谢指针。
当我更正错误时,我在查询时遇到错误... 好 异常java.io.IOException失败:org.apache.avro.AvroTypeException:找到com.treselle.db.model.order_details,期待联合
经过进一步分析后,我发现avro文件中的枚举类型和固定类型都导致&#34;期待联合&#34;错误。 删除这些列后,我能够成功查询Hive表。