Question

我正在尝试将CSV文件插入到Hive中，其中一个字段是字符串数组。

这是CSV文件：

48,Snacks that Power Up Weight Loss,Aidan B. Prince,[Health&Fitness,Travel]
99,Snacks that Power Up Weight Loss,Aidan B. Prince,[Photo,Travel]

我尝试创建这样的表：

CREATE TABLE IF NOT EXISTS Article
(
ARTICLE_ID INT,
ARTICLE_NSAME STRING,
ARTICLE_AUTHOR STRING,
ARTICLE_GENRE ARRAY<STRING>
);
LOAD DATA INPATH '/tmp/pinterest/article.csv' OVERWRITE INTO TABLE Article;
select * from Article;

以下是我得到的结果：

article.article_id  article.article_name    article.article_author  article.article_genre
48  Snacks that Power Up Weight Loss    Aidan B. Prince ["[Health&Fitness"]
99  Snacks that Power Up Weight Loss    Aidan B. Prince ["[Photo"]

它在最后一个字段article_genre中只占一个值。

有人可以指出这里有什么问题吗？

Answer 1

一些东西：
您缺少收集项目分隔符的定义另外，我假设您希望you select * from article语句返回如下：

48  Snacks that Power Up Weight Loss    Aidan B. Prince ["Health&Fitness","Travel"]
99  Snacks that Power Up Weight Loss    Aidan B. Prince ["Photo","Travel"]

我可以给你一个例子，休息你可以摆弄它。这是我的表定义：

create table article (
  id int,
  name string,
  author string,
  genre array<string>
)
row format delimited
fields terminated by ','
collection items terminated by '|';

以下是数据：

48,Snacks that Power Up Weight Loss,Aidan B. Prince,Health&Fitness|Travel
99,Snacks that Power Up Weight Loss,Aidan B. Prince,Photo|Travel

现在做一个负载如下：
LOAD DATA local INPATH '/path' OVERWRITE INTO TABLE article; 并选择语句来检查结果。

最重要的一点：
定义集合项的分隔符，不要强加你在正常编程中执行的数组结构此外，尝试使字段分隔符与集合项分隔符不同，以避免混淆和意外结果。

Answer 2

要在Hive表中插入字符串数组，我们需要注意以下几点。

 1. While creating Hive table.Collection items should be terminated by "," ('colelction.delim'=',',)
 2. Data should be like that in CSV file
  48  Snacks that Power Up Weight Loss    Aidan B. Prince Health&Fitness,Travel
You can modify file  by running below SED commands in follwing order:
 - sed -i 's/\[\"//g' filename
 - sed -i 's/\"\]//g' filename
 - sed -i 's/"//g' filename

使用字符串数组在Hive表上加载CSV文件

2 个答案: