Hive - 使用除行终止符之外的文本文件创建外部表

时间:2017-03-23 12:16:10

标签: hadoop hive hdfs hiveql

我想用一组文本文件创建一个外部表。每行应该是一个文本文件。一个文本文件的示例如下,可以有多个文本文件。(文件存储在HDFS中)

thanking 
you 
for 
the 
participation 

行由/n终止。我想用上面的文本文件创建一个外部表,文本文件中的数据应该在一行(一个单元格)中。

我尝试了以下Create table语句。

Create External table if not exists sample_email(
  email STRING
)
STORED AS TEXTFILE
LOCATION '/tmp/txt/sample/';

它将给出如下的create table。

+--------------------------------------+
+   email                              +
+--------------------------------------+
+ thanking                             +
+--------------------------------------+
+ you                                  +
+--------------------------------------+
+ for                                  +
+--------------------------------------+
+ the                                  +
+--------------------------------------+
+participation                         +
+--------------------------------------+
+please                                +
+--------------------------------------+
+find                                  +
+--------------------------------------+
+the                                   +
+--------------------------------------+
+discussed                             +
+--------------------------------------+
+points                                +
+--------------------------------------+

但我想如下。

+--------------------------------------+
+   email                              +
+--------------------------------------+
+ thanking you for the participation   +
+--------------------------------------+
+ please find the discussed points     +
+--------------------------------------+

如何克服我的问题? 提前谢谢

3 个答案:

答案 0 :(得分:1)

select      concat_ws(' ',collect_list(email))  as emails
from        sample_email
group by    input__file__name
+------------------------------------+
|               emails               |
+------------------------------------+
| thanking you for the participation |
| please find the discussed points   |
+------------------------------------+

答案 1 :(得分:0)

使用tr从文件中删除\ n。

hadoop fs -cat file.txt |  tr -d '\n' | hadoop fs -put - new_file.txt

答案 2 :(得分:0)

set textinputformat.record.delimiter='\0';
select  translate(email,'\n',' ') as emails 
from    sample_email
+-------------------------------------+
|               emails                |
+-------------------------------------+
| thanking you for the participation  |
| please find the discussed points    |
+-------------------------------------+

不幸的是,我仍然不知道如何在同一会话中将textinputformat.record.delimiter设置回换行。

How to reset textinputformat.record.delimiter to its default value within hive cli / beeline?