使用com.bizo.hive.serde.csv.CSVSerde时,将所有内容导入为字符串

时间:2016-08-15 10:03:41

标签: hadoop hive

我下载了stackoverflow用户转储,因此我可以习惯于hive,并且我已经将xml转换为csv文件。我使用以下内容:

add jar /home/cloudera/csv-serde.jar;
drop table stackoverflow_users;

CREATE external TABLE IF NOT EXISTS stackoverflow_users (CreationDate timestamp, Views BIGINT,
  AccountId BIGINT, AboutMe string,
  WebsiteUrl string, LastAccessDate timestamp, upvotes bigint,
  ProfileImageUrl string, DisplayName string,
  Id BigInt, Reputation BIGINT, DownVotes bigint,
  Age int, Location String)
ROW FORMAT SERDE 'com.bizo.hive.serde.csv.CSVSerde'
location '/user/cloudera/users';

文件行采用以下格式:

"2008-08-01T12:09:11.010","1347","14","","http://some.url","2016-01-15T01:44:05.733","369","","User name","20","6943","38","","Some location"
"2008-08-01T12:11:11.897","830","15","","http://some.url","2016-06-11T01:38:09.770","191","","User name","22","8727","5","30","Some location"

但是,如果我执行desc stackoverflow_users,我会看到以下内容:

+------------------+------------+--------------------+--+
|     col_name     | data_type  |      comment       |
+------------------+------------+--------------------+--+
| creationdate     | string     | from deserializer  |
| views            | string     | from deserializer  |
| accountid        | string     | from deserializer  |
| aboutme          | string     | from deserializer  |
| websiteurl       | string     | from deserializer  |
| lastaccessdate   | string     | from deserializer  |
| upvotes          | string     | from deserializer  |
| profileimageurl  | string     | from deserializer  |
| displayname      | string     | from deserializer  |
| id               | string     | from deserializer  |
| reputation       | string     | from deserializer  |
| downvotes        | string     | from deserializer  |
| age              | string     | from deserializer  |
| location         | string     | from deserializer  |
+------------------+------------+--------------------+--+

为什么一切都是字符串?

1 个答案:

答案 0 :(得分:0)

问题在于使用SerDe。它还报告了here