蜂巢表加载为Parquet格式

时间:2018-08-24 19:22:08

标签: hive parquet

我有以下输入文件。我需要以orc和parquet格式在蜂巢表中加载此文件。

产品编号,产品编号,名称,数量,价格,供应商编号 1001,PEN,笔红,5000,1.23,501 1002,PEN,Pen Blue,8000,1.25,501

我已将代码粘贴在底部。我能够在orc蜂巢表中成功创建并加载,但不能在镶木地板中加载。

创建并加载镶木地板表后,当我查询时,所有字段仅看到NULL值。我有什么想念的吗?

val productsupplies = sc.textFile("/user/cloudera/product.csv")
val productfirst = productsupplies.first
val product = productsupplies.filter(f => f != productfirst).map(x => { val a = x.split(",")
(a(0).toInt,a(1),a(2),a(3),a(4).toFloat,a(5))
}).toDF("productID","productCode","name","quantity","price","supplierid")




product.write.orc("/user/cloudera/productsupp.orc")
product.write.parquet("/user/cloudera/productsupp.parquet")


 val hc = new org.apache.spark.sql.hive.HiveContext(sc)

hc.sql("create table product_supp_orc ( " + 
"product_id int, " + 
"product_code string, " + 
"product_name string, " + 
"product_quatity string, " + 
"product_price float, " + 
"product_supplier_id string) stored as orc " + 
"location \"/user/cloudera/productsupp.orc \" ")





hc.sql("create table product_supp_parquet ( " + 
"product_id int, " + 
"product_code string, " + 
"product_name string, " + 
"product_quatity string, " + 
"product_price float, " + 
"product_supplier_id string) stored as parquet " + 
"location \"/user/cloudera/productsupp.parquet\" ")




hc.sql("select * from product_supp_parquet")

1 个答案:

答案 0 :(得分:0)

尝试:

hc.sql("create table product_supp_parquet ( " + 
"productid int, " + 
"productcode string, " + 
"name string, " + 
"quantity string, " + 
"price float, " + 
"supplierid string) stored as parquet " + 
"location \"/user/cloudera/products.parquet\" ")

基本上,名称必须与您在上传文件中使用的名称相同。