CREATE TABLE `rk_test22`(
`index` int,
`country` string,
`description` string,
`designation` string,
`points` int,
`price` int,
`province` string,
`region_1` string,
`region_2` string,
`taster_name` string,
`taster_twitter_handle` string,
`title` string,
`variety` string,
`winery` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'input.regex'=',(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://namever/user/hive/warehouse/robert.db/rk_test22'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='true',
'numFiles'='1',
'skip.header.line.count'='1',
'totalSize'='52796693',
'transient_lastDdlTime'='1516088117');
我使用上面的命令创建了hive表。现在我想使用load data命令将以下行(在CSV文件中)加载到表中。 load data命令显示状态OK但我无法在该表中看到数据。
0,Italy,"Aromas include tropical fruit, broom, brimstone and dried herb. The palate isn't overly expressive, offering unripened apple, citrus and dried sage alongside brisk acidity.",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
答案 0 :(得分:0)
如果您要加载一行CSV文件,则会因为此属性而跳过该行:'skip.header.line.count'='1'
此外,Regex应为每列包含一个捕获组。就像在这个答案:https://stackoverflow.com/a/47944328/2700344
为什么在表DDL中提供这些设置:
'COLUMN_STATS_ACCURATE'='true'
'numFiles'='1',
'totalSize'='52796693',
'transient_lastDdlTime'='1516088117'
所有这些都应该在DDL和ANALYZE之后自动设置。