以下是我的数据源中的列
BibNum
Title
Author
ISBN
PublicationYear
Publisher
Subjects
ItemType
ItemCollection
FloatingItem
ItemLocation
ReportDate
ItemCount
我只有publisher
列的值。
我上传了一个截图,如果你知道原因和方法可以修复,请告诉我真的很感激:
下面是第一行的真实值(我用//标记分隔表示每一列)
3011076//
A tale of two friends / adapted by Ellie O'Ryan ; illustrated by Tom Caulfield, Frederick Gardner, Megan Petasky, and Allen Tam. //
O'Ryan, Ellie //
1481425730, 1481425749, 9781481425735, 9781481425742 //
2014 //
Simon Spotlight, Musicians Fiction, Bullfighters Fiction, Best friends Fiction, Friendship Fiction, Adventure and adventurers Fiction //
jcbk //
ncrdr //
Floating //
qna //
09/01/2017 //
1
这是第二行的真正价值
2248846 //
Naruto. Vol. 1, Uzumaki Naruto / story and art by Masashi Kishimoto ; [English adaptation by Jo Duffy]. //
Kishimoto, Masashi, 1974- //
1569319006 //
2003, c1999. //
Viz, Ninja Japan Comic books strips etc, Comic books strips etc Japan Translations into English, Graphic novels //
acbk//
nycomic//
NA//
lcy//
09/01/2017//
1
hive> select * from timesheet limit 3;
OK
NULL Title Author ISBN PublicationYear Publisher Subjects ItemType ItemCollection FloatingItem ItemLocation ReportDate ItemCount
3011076 "A tale of two friends / adapted by Ellie O'Ryan ; illustrated by Tom Caulfield Frederick Gardner Megan Petasky and Allen Tam." "O'Ryan Ellie" "1481425730 1481425749 9781481425735 9781481425742" 2014. "Simon Spotlight
2248846 "Naruto. Vol. 1 Uzumaki Naruto / story and art by Masashi Kishimoto ; [English adaptation by Jo Duffy]." "Kishimoto Masashi 1974-" 1569319006 "2003 c1999." "Viz " "Ninja Japan Comic books strips etc Comic books strips etc Japan Translations into English
Time taken: 0.149 seconds
hive> desc timesheet
> ;
OK
bibnum bigint
title string
author string
isbn string
publication string
publisher string
subjects string
itemtype string
itemcollection string
floatingitem string
itemlocation string
reportdate string
itemcount string
Time taken: 0.21 seconds
BibNum,Title,Author,ISBN,PublicationYear,Publisher,Subjects,ItemType,ItemCollection,FloatingItem,ItemLocation,ReportDate,ItemCount | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | |
3011076,"两个朋友的故事/由Ellie O&#Ry;改编;由Tom Caulfield,Frederick Gardner,Megan Petasky和Allen Tam撰写。" O' Ryan,Ellie"," 1481425730,1481425749,9781481425735,9781481425742", 2014年," Simon Spotlight,","音乐家小说,斗牛小说,最佳朋友小说,友谊小说,冒险和冒险小说",jcbk,ncrdr,Floating,qna,09 / 01 / 2017,1 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |空|
答案 0 :(得分:0)
所以Apache Hive本身无法像这个CSV那样处理数据,但是使用SerDe(Serializer / Deserializer)它可以帮助解决这个问题
使用hive v0.14 +内置serde,默认分隔符为WARN org.hibernate.engine.jdbc.spi.SqlExceptionHelper - SQL Error: -5501, SQLState: 42501
ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper - user lacks privilege or object not found: <Database table object name>
,因此对于您的CSV,这应该可以使用
,
如果任何列中都有未转义的引号,您必须手动进入并确定哪些列是哪个列...
答案 1 :(得分:0)
由于csv文件用逗号分隔,因此如果您将列指定为字符串,则整行将被加载到该列中。因此,在创建表时,您可以指定行值由逗号分隔。
create table table_name (
....
) row format delimited fields terminated by ',' lines terminated by '\n';
然后使用加载csv文件
load data local inpath path_to_file to table table_name;
希望这会有所帮助:)