Question

这是数据（您也可以从here下载）：

"Creation Date","Status","First 3 Chars of Postal Code","Intersection Street 1","Intersection Street 2","Ward","Service Request Type","Division","Section"
"2010-01-01 00:38:26.0000000","Closed","Intersection","High Park Blvd","Parkside Dr","Parkdale-High Park (13)","Road - Sanding / Salting Required","Transportation Services","Road Operations"
"2010-01-01 01:19:18.0000000","Closed","M4T","","","Toronto Centre-Rosedale (27)","Water Service Line-Turn On","Toronto Water","District Ops"

这是我的创建表查询：

CREATE TABLE sr.sr2013 ( 
creation_date STRING,   
status STRING,   
first_3_chars_of_postal_code STRING,   
intersection_street_1 STRING,   
intersection_street_2 STRING,   
ward STRING,   
service_request_type STRING,   
division STRING,   
section STRING ) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
WITH SERDEPROPERTIES (
'colelction.delim'='\u0002', 
'mapkey.delim'='\u0003', 
'serialization.format'=',', 
'field.delim'=',', 
'skip.header.line.count'='1',
'quoteChar'= "\"") ;

这是加载数据查询：

load data inpath '/user/rxie/SR2013.csv' into table sr2013;

加载数据后，检查表是否保留了所有原始引号：

所以这里至少有两个问题： 1.表创建中的选项'skip.header.line.count'='1',不排除标题； 2.在将数据加载到表中时，不会如选项'quoteChar'= "\""所示删除双引号

任何人都可以分享更多的光吗？在我看来，这就像是虫子。

更新1：

在Hue / Hive编辑器中：

creation_date STRING,   
status STRING,   
first_3_chars_of_postal_code STRING,   
intersection_street_1 STRING,   
intersection_street_2 STRING,   
ward STRING,   
service_request_type STRING,   
division STRING,   
section STRING )                               
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
WITH SERDEPROPERTIES (                             
   'colelction.delim'='\u0002',                     
   'field.delim'=',',                               
   'mapkey.delim'='\u0003',                         
   'serialization.format'=',',
   'skip.header.line.count'='1',   
   'quoteChar'= "\"") 


   LOAD DATA LOCAL INPATH '/home/rxie/data/csv/SR2015.csv' INTO TABLE sr2015;

错误：

编译语句时出错：失败：SemanticException行1:26 无效的路径``/home/rxie/data/csv/SR2015.csv''：没有文件匹配路径文件：/home/rxie/data/csv/SR2015.csv

Answer 1

以下是我在加载csv时要排除引号的方法，如下所示：

在Hive编辑器中（我认为beeline也不错，尽管我没有对其进行测试）：

创建Hive表

创建外部表sr2015（
creation_date STRING，
状态STRING，
first_3_chars_of_postal_code STRING，
junction_street_1 STRING，
junction_street_2 STRING，
病房STRING，
service_request_type STRING，
部门STRING，
STRING部分）
行格式SERDE'org.apache.hadoop.hive.serde2.OpenCSVSerde' 带有SERDEPROPERTIES（
   'colelction.delim'='\ u0002'，
   'field.delim'='，'，
   'mapkey.delim'='\ u0003'，
   'serialization.format'='，'，    'skip.header.line.count'='1'，
   'quoteChar'=“ \”“）
将数据加载到Hive表中：

LOAD DATA INPATH“ hdfs：///user/rxie/SR2015.csv”插入表sr2015;

有待解决的问题（将在here中进行讨论）：无法在Impala

中访问该表

将CSV加载到Impala的外部表中时如何删除双引号？

1 个答案: