我想用一组文本文件创建一个外部表。每行应该是一个文本文件。一个文本文件的示例如下,可以有多个文本文件。(文件存储在HDFS中)
thanking
you
for
the
participation
行由/n
终止。我想用上面的文本文件创建一个外部表,文本文件中的数据应该在一行(一个单元格)中。
我尝试了以下Create table语句。
Create External table if not exists sample_email(
email STRING
)
STORED AS TEXTFILE
LOCATION '/tmp/txt/sample/';
它将给出如下的create table。
+--------------------------------------+
+ email +
+--------------------------------------+
+ thanking +
+--------------------------------------+
+ you +
+--------------------------------------+
+ for +
+--------------------------------------+
+ the +
+--------------------------------------+
+participation +
+--------------------------------------+
+please +
+--------------------------------------+
+find +
+--------------------------------------+
+the +
+--------------------------------------+
+discussed +
+--------------------------------------+
+points +
+--------------------------------------+
但我想如下。
+--------------------------------------+
+ email +
+--------------------------------------+
+ thanking you for the participation +
+--------------------------------------+
+ please find the discussed points +
+--------------------------------------+
如何克服我的问题? 提前谢谢
答案 0 :(得分:1)
select concat_ws(' ',collect_list(email)) as emails
from sample_email
group by input__file__name
+------------------------------------+
| emails |
+------------------------------------+
| thanking you for the participation |
| please find the discussed points |
+------------------------------------+
答案 1 :(得分:0)
使用tr从文件中删除\ n。
hadoop fs -cat file.txt | tr -d '\n' | hadoop fs -put - new_file.txt
答案 2 :(得分:0)
set textinputformat.record.delimiter='\0';
select translate(email,'\n',' ') as emails
from sample_email
+-------------------------------------+
| emails |
+-------------------------------------+
| thanking you for the participation |
| please find the discussed points |
+-------------------------------------+
不幸的是,我仍然不知道如何在同一会话中将textinputformat.record.delimiter
设置回换行。
How to reset textinputformat.record.delimiter to its default value within hive cli / beeline?