Hive中无法识别刺字符分隔符

时间:2015-05-14 18:58:49

标签: hadoop encoding hive

如帖子Using the Icelandic Thorn character as a delimiter in Hive中所述 在Hive

中无法识别刺字符分隔符

样本表

CREATE EXTERNAL TABLE IF NOT EXISTS zzzzz_raw ( spot_id INT, activity_type_id INT, activity_type STRING, activity_id INT, activity_sub_type STRING, report_name STRING, tag_method_id INT ) PARTITIONED BY ( dt DATE ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\-2' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/raw/data/networkmatchtablesactivity/activity_cat';

输出

从activity_cat_raw限制1中选择*;

4552126þ805759þeaasv101þ2275868þbfeaac01þBF_EA Access_Info Pageþ2       NULL    NULL    NULL    NULL    NULL    NULL    2015-03-24

我错过了什么吗?

1 个答案:

答案 0 :(得分:-1)

我找到了答案。 而不是'-2'(刺分隔符),我使用'-61'分隔符,然后使用子字符串来删除附加符号,如下所示

CREATE EXTERNAL TABLE IF NOT EXISTS SSSSSS ( spot_id STRING, activity_type_id STRING, activity_type STRING, activity_id STRING, activity_sub_type STRING, report_name STRING, tag_method_id STRING ) PARTITIONED BY ( dt STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\-61' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION 'SSSSSS';

然后使用substring删除其他符号

INSERT OVERWRITE TABLE vvvvvv PARTITION (dt) SELECT spot_id STRING, substr(activity_type_id,2), dt FROM SSSSS

希望有所帮助......