我正在尝试将数据从HIVE中的临时表加载到常规表中。下面是临时表中的前几行。
ProdNo,ProdName,ProdMfg,ProdQOH,ProdPrice,ProdNextShipDate
P0036566,17 inch Color Monitor,ColorMeg Inc.,12,$169.00,2013-02-20
P0036577,19 inch Color Monitor,ColorMeg Inc.,10,$319.00,2013-02-20
P1114590,R3000 Color Laser Printer,Connex,5,$699.00,2013-01-22
我使用以下代码来执行此操作
insert overwrite table product
SELECT
regexp_extract(col_value, '^(?:([^,]*),?){1}', 1) ProdNo,
regexp_extract(col_value, '^(?:([^,]*),?){2}', 1) ProdName,
regexp_extract(col_value, '^(?:([^,]*),?){3}', 1) ProdMfg,
regexp_extract(col_value, '^(?:([^,]*),?){4}', 1) ProdQOH,
regexp_extract(col_value, '^(?:([^,]*),?){5}', 1) ProdPrice,
regexp_extract(col_value, '^(?:([^,]*),?){6}', 1) ProdNextShipDate
from product_temp;
运行上面的代码后,除了ProdPrice列的所有值都为NULL之外,常规表中的所有列都是完美的。那么如何从没有$符号的临时表中提取价格并将其加载到常规表中?以下是ProdPrice为null的当前输出。
ProdNo ProdName ProdMfg ProdQOH ProdPrice date
P0036566 17 inch Color Monitor ColorMeg Inc. 12 null 2013-02-20
P0036577 19 inch Color Monitor ColorMeg Inc. 10 null 2013-02-20
这是产品表结构
CREATE TABLE `product`(
`prodno` string,
`prodname` string,
`prodmfg` string,
`prodqoh` int,
`prodprice` string,
`prodnextshipdate` date)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
LOCATION
'hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/sales_db.db/product'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='true',
'last_modified_by'='maria_dev',
'last_modified_time'='1488149236',
'numFiles'='1',
'numRows'='11',
'rawDataSize'='516',
'totalSize'='650',
'transient_lastDdlTime'='1488149304')
由于
答案 0 :(得分:1)
您正在尝试插入文字,例如$169.00
到数字字段。
Hive通过插入NULL值来处理这种不匹配
将ProdPrice更改为字符串或删除'$'符号(如果有其他货币可用,请将货币符号保存在附加列中)
答案 1 :(得分:0)
insert overwrite table product
select val[0],val[1],val[2],val[3],val[4],val[5]
from (select split (col_value,',') as val from product_temp) t
create table product_temp (col_value string);
insert into product_temp values
('P0036566,17 inch Color Monitor,ColorMeg Inc.,12,$169.00,2013-02-20')
,('P0036577,19 inch Color Monitor,ColorMeg Inc.,10,$319.00,2013-02-20')
,('P1114590,R3000 Color Laser Printer,Connex,5,$699.00,2013-01-22' )
;
select val[0] as ProdNo
,val[1] as ProdName
,val[2] as ProdMfg
,val[3] as ProdQOH
,val[4] as ProdPrice
,val[5] as ProdNextShipDate
from (select split (col_value,',') as val
from product_temp
) t
;
+----------+---------------------------+---------------+---------+-----------+------------------+
| prodno | prodname | prodmfg | prodqoh | prodprice | prodnextshipdate |
+----------+---------------------------+---------------+---------+-----------+------------------+
| P0036566 | 17 inch Color Monitor | ColorMeg Inc. | 12 | $169.00 | 2013-02-20 |
| P0036577 | 19 inch Color Monitor | ColorMeg Inc. | 10 | $319.00 | 2013-02-20 |
| P1114590 | R3000 Color Laser Printer | Connex | 5 | $699.00 | 2013-01-22 |
+----------+---------------------------+---------------+---------+-----------+------------------+