这是数据(您也可以从here下载):
"Creation Date","Status","First 3 Chars of Postal Code","Intersection Street 1","Intersection Street 2","Ward","Service Request Type","Division","Section"
"2010-01-01 00:38:26.0000000","Closed","Intersection","High Park Blvd","Parkside Dr","Parkdale-High Park (13)","Road - Sanding / Salting Required","Transportation Services","Road Operations"
"2010-01-01 01:19:18.0000000","Closed","M4T","","","Toronto Centre-Rosedale (27)","Water Service Line-Turn On","Toronto Water","District Ops"
这是我的创建表查询:
CREATE TABLE sr.sr2013 (
creation_date STRING,
status STRING,
first_3_chars_of_postal_code STRING,
intersection_street_1 STRING,
intersection_street_2 STRING,
ward STRING,
service_request_type STRING,
division STRING,
section STRING )
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
WITH SERDEPROPERTIES (
'colelction.delim'='\u0002',
'mapkey.delim'='\u0003',
'serialization.format'=',',
'field.delim'=',',
'skip.header.line.count'='1',
'quoteChar'= "\"") ;
这是加载数据查询:
load data inpath '/user/rxie/SR2013.csv' into table sr2013;
加载数据后,检查表是否保留了所有原始引号:
所以这里至少有两个问题:
1.表创建中的选项'skip.header.line.count'='1',
不排除标题;
2.在将数据加载到表中时,不会如选项'quoteChar'= "\""
所示删除双引号
任何人都可以分享更多的光吗?在我看来,这就像是虫子。
更新1:
在Hue / Hive编辑器中:
creation_date STRING,
status STRING,
first_3_chars_of_postal_code STRING,
intersection_street_1 STRING,
intersection_street_2 STRING,
ward STRING,
service_request_type STRING,
division STRING,
section STRING )
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'colelction.delim'='\u0002',
'field.delim'=',',
'mapkey.delim'='\u0003',
'serialization.format'=',',
'skip.header.line.count'='1',
'quoteChar'= "\"")
LOAD DATA LOCAL INPATH '/home/rxie/data/csv/SR2015.csv' INTO TABLE sr2015;
错误:
编译语句时出错:失败:SemanticException行1:26 无效的路径``/home/rxie/data/csv/SR2015.csv'':没有文件匹配 路径文件:/home/rxie/data/csv/SR2015.csv
答案 0 :(得分:0)
以下是我在加载csv时要排除引号的方法,如下所示:
在Hive编辑器中(我认为beeline也不错,尽管我没有对其进行测试):
创建Hive表
创建外部表sr2015(
creation_date STRING,
状态STRING,
first_3_chars_of_postal_code STRING,
junction_street_1 STRING,
junction_street_2 STRING,
病房STRING,
service_request_type STRING,
部门STRING,
STRING部分)
行格式SERDE'org.apache.hadoop.hive.serde2.OpenCSVSerde'
带有SERDEPROPERTIES(
'colelction.delim'='\ u0002',
'field.delim'=',',
'mapkey.delim'='\ u0003',
'serialization.format'=',',
'skip.header.line.count'='1',
'quoteChar'=“ \”“)
将数据加载到Hive表中:
LOAD DATA INPATH“ hdfs:///user/rxie/SR2015.csv”插入表sr2015;
有待解决的问题(将在here中进行讨论): 无法在Impala
中访问该表