将csv文件加载到Hive表

时间:2016-07-04 16:07:44

标签: csv hadoop hive hql

我有一个csv文件,里面有这样的内容。

"DepartmentID","Name","GroupName","ModifiedDate"
"1","Engineering","Research and Development","2008-04-30 00:00:00"

我有

create external table if not exists AdventureWorks2014.Department
( 
    DepartmentID smallint , 
    Name string ,
   GroupName string, 
    rate_code string, 
    ModifiedDate timestamp 
)   
ROW FORMAT DELIMITED FIELDS TERMINATED BY '","' lines terminated by '\n'
STORED AS TEXTFILE LOCATION 'wasb:///ds/Department' TBLPROPERTIES('skip.header.line.count'='1');`

加载数据后

LOAD DATA INPATH 'wasb:///ds/Department.csv' INTO TABLE AdventureWorks2014.Department;

未加载数据。

select * from AdventureWorks2014.Department;

以上选择不返回任何内容。

我认为每个文件的双引号都是问题所在。有没有办法将数据从这样的文件加载到hive表,而不必删除双引号?

3 个答案:

答案 0 :(得分:1)

FIELDS TERMINATED BY'“,”'不正确。您的字段由a终止,而不是“,”。将您的DDL更改为FIELDS TERMINATED BY','。

答案 1 :(得分:1)

试试这个(手机......)

create external table if not exists AdventureWorks2014.Department ( DepartmentID smallint , Name string , GroupName string, rate_code string, ModifiedDate timestamp )
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'      
STORED AS TEXTFILE 
LOCATION 'wasb:///ds/Department' 
  

**限制**
  此SerDe将所有列视为String类型。即使使用此SerDe创建具有非字符串列类型的表,DESCRIBE TABLE输出也将显示字符串列类型。从SerDe检索类型信息。要将列转换为表中所需的类型,可以在表格上创建一个视图,使CAST成为所需类型。

https://cwiki.apache.org/confluence/display/Hive/CSV+Serde

答案 2 :(得分:0)

LOAD DATA LOCAL INPATH' /home/hadoop/hive/log_2013805_16210.log' into table_name