无法在Hive分区表中加载数据

时间:2016-08-07 16:36:15

标签: hadoop hive

我在Hive中创建了一个表,其中包含以下查询:

create table if not exists employee(CASE_NUMBER String,
                                         CASE_STATUS String,
                                         CASE_RECEIVED_DATE DATE,
                                         DECISION_DATE  DATE,
                                         EMPLOYER_NAME STRING,
                                         PREVAILING_WAGE_PER_YEAR BIGINT,
                                         PAID_WAGE_PER_YEAR BIGINT,
                                         order_n int) partitioned by (JOB_TITLE_SUBGROUP STRING) row format delimited fields terminated by ',';

我尝试使用以下查询将数据加载到create table中:

LOAD DATA INPATH '/salary_data.csv' overwrite into table employee  partition (JOB_TITLE_SUBGROUP);

对于分区表,我甚至设置了以下配置:

set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;

但是我在执行加载查询时遇到错误:

  

您的查询有以下错误:

     

编译语句时出错:FAILED:SemanticException org.apache.hadoop.hive.ql.metadata.HiveException:MetaException(消息:无效的分区键和值;键[job_title_subgroup,],values [])

请帮忙。

2 个答案:

答案 0 :(得分:2)

如果要将数据加载到Hive分区,则必须在LOAD DATA查询中提供分区本身的值。所以在这种情况下,您的查询将是这样的。

LOAD DATA INPATH '/salary_data.csv' overwrite into table employee partition (JOB_TITLE_SUBGROUP="Value");

其中“Value”是您要在其中加载数据的分区的名称。原因是因为Hive将使用“Value”来创建存储.csv的目录,如下所示:.../employee/JOB_TITLE_SUBGROUP=Value。我希望这会有所帮助。

检查the documentation以获取有关LOAD DATA语法的详细信息。

EDITED

由于表具有动态分区,因此一种解决方案是将.csv加载到外部表(例如employee_external),然后执行INSERT命令,如下所示:

INSERT OVERWRITE INTO TABLE employee PARTITION(JOB_TITLE_SUBGROUP)
SELECT CASE_NUMBER, CASE_STATUS, (...), JOB_TITLE_SUBGROUP
FROM employee_external

答案 1 :(得分:2)

我可能来不及回复,但可以尝试以下步骤:

  1. 首先设置以下属性:

    Ø set hive.exec.dynamic.partition.mode=nonstrict;
    Ø set hive.exec.dynamic.partition=true;
    
  2. 首先创建临时表:

    CREATE EXTERNAL TABLE IF NOT EXISTS employee_temp(
    ID STRING,
    Name STRING,
    Salary STRING)
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    LINES TERMINATED BY '\n'
    tblproperties ("skip.header.line.count"="1");
    
  3. 在临时表中加载数据:

    hive> LOAD DATA INPATH 'filepath/employee.csv' OVERWRITE INTO TABLE employee; 
    
  4. 创建分区表:

    CREATE EXTERNAL TABLE IF NOT EXISTS employee_part(
    ID STRING,
    Name STRING)
    PARTITIONED BY (Salary STRING)
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    LINES TERMINATED BY '\n'
    tblproperties ("skip.header.line.count"="1");
    
  5. 将数据从中间/临时表加载到分区表中

    INSERT OVERWRITE TABLE employee_part PARTITION (SALARY) SELECT * FROM employee;