Hadoop在明确声明不同角色后仍将逗号视为分隔符

时间:2015-03-26 18:20:10

标签: hadoop hive biginsights

我目前正在将数据导入配置单元表。当我们创建表时,我们使用了

CREATE EXTERNAL TABLE Customers
(
Code      string,
Company      string,
FirstName     string,
LastName     string,
DateOfBirth string,
PhoneNo     string,
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';

因为我们的数据中有逗号。但是,我们现在发现逗号仍然被视为字段分隔符,以及|我们用来分隔字段。有什么方法可以解决这个问题吗?我们是否必须在数据中删除每个逗号或是否有更简单的方法来设置它?

示例数据

1|2|3|4
a|b|c|d
John|Joe|Bob, Jr|Alex

当放在表格中时显示为

1 2 3 4
a b c d
John Joe Bob Jr

Jr占据了自己的专栏并将Alex从桌子上撞了出来。

1 个答案:

答案 0 :(得分:0)

使用您的数据对我来说很好。 Hive版本为0.13

hive> create external table foo(
    > first string,
    > second string,
    > third string,
    > forth string)
    > row format delimited fields terminated by '|' lines terminated by '\n';
OK
Time taken: 3.222 seconds
hive> load data inpath '/user/xilan/data.txt' overwrite into table foo;

hive> select third from foo;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1422157058628_0001, Tracking URL =    http://host:8088/proxy/application_1422157058628_0001/
Kill Command = /scratch/xilan/hadoop/bin/hadoop job  -kill job_1422157058628_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-03-27 07:05:41,901 Stage-1 map = 0%,  reduce = 0%
2015-03-27 07:05:50,190 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.24 sec
MapReduce Total cumulative CPU time: 1 seconds 240 msec
Ended Job = job_1422157058628_0001
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 1.24 sec   HDFS Read: 245 HDFS Write: 12     SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 240 msec
OK
3
c
Bob, Jr
Time taken: 18.853 seconds, Fetched: 3 row(s)
hive>