如何忽略hive插入查询中的输入开始

时间:2017-07-11 20:32:07

标签: python mysql hadoop hive impala

我在制表符中分隔了数据格式 州:加州城市:加州人口:1M

我想创建数据库,当我插入时我应该忽略"状态:" ,"城市:"和" poulation"我想把人口和城市的状态数据库插入到具有人口的城市表中。

将有2个表,然后一个有州和人口,另一个有城市和人口

CREATE EXTERNAL TABLE IF NOT EXISTS CSP.original 
(
    st STRING COMMENT 'State', 
    ct STRING COMMENT 'City', 
    po STRING COMMENT 'Population'
) 
COMMENT 'Original Table' 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 

这没有用。它添加了评论,但它并没有被忽视。 而且我还要为州和城市创建2个表。有人可以帮帮我吗?

1 个答案:

答案 0 :(得分:0)

您必须先创建外部表。

第1步:

CREATE EXTERNAL TABLE all_info (state STRING, population INT) PARTITIONED BY (date STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t;

步骤2:

CREATE TABLE IF NOT EXISTS state (state string, population INT) PARTITIONED BY (date string);
CREATE TABLE IF NOT EXISTS city (city string, population INT) PARTITIONED BY (date string);

步骤3:

INSERT OVERWRITE TABLE state
PARTITION (date = ‘201707076’)
SELECT *
FROM all_info
WHERE date = ‘20170706’ AND
              instr(state, ‘state:’) = 1;  
INSERT OVERWRITE TABLE city
PARTITION (date = ‘201707076’)
SELECT *
FROM all_info
WHERE date = ‘20170706’ AND
              instr(state, ‘city:’) = 1;