我正在尝试在s3上创建一个关于这个mahout推荐系统输出数据的表。
703209355938578 [18519:1.5216354,18468:1.5127649,17962:1.5094717,18317:1.5075916]
828667482548563 [18070:1.0,18641:1.0,18632:1.0,18770:1.0,17814:1.0,18095:1.0]
1705358040772485 [18783:1.0,17944:1.0,18632:1.0,18770:1.0,18914:1.0,18386:1.0]
使用此架构,
CREATE external table user_ad_reco (
userid bigint,
reco MAP<bigint , double>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY ':'
LOCATION
's3://xxxxx/data/RS/output/m05/';
但是当我用hive读回数据时,
hive&gt;
select * from user_ad_reco limit 10;
正在提供这样的输出
703209355938578 {18519:1.5216354,18468:1.5127649,17962:null}
828667482548563 {18070:1.0,18641:1.0,18632:1.0,18770:1.0,17814:null}
1705358040772485 {18783:1.0,17944:1.0,18632:1.0,18770:1.0,18914:null}
所以,最后一个键:输出中缺少映射输入的值,在最后一个输出对中为null:(。
有人可以帮忙解决这个问题吗?
答案 0 :(得分:0)
空值的原因:
703209355938578 [ 18519:1.5216354,18468:1.5127649,17962:1.5094717,18317:1.5075916 ]
703209355938578 18519:1.5216354,18468:1.5127649,17962:1.5094717,18317:1.5075916
答案 1 :(得分:0)
谢谢@ramisetty,我是以间接方式完成的,首先从地图字符串中删除两个括号[,],然后在没有括号的字符串上创建模式。
CREATE EXTERNAL TABLE user_ad_reco_serde (
userid STRING,
reco_map STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "([0-9]+)\\s\\[([^]]+)]"
)
STORED AS TEXTFILE
LOCATION
's3://xxxxxx/data/RS/output/6m/2014-01-2014-05/';
CREATE external table user_ad_reco_plain(
userid bigint,
reco string)
LOCATION
's3://xxxxx/data/RS/output/6m_plain/2014-01-2014-05/';
CREATE external table user_ad_reco (
userid bigint,
reco MAP<bigint , double>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY ':'
LOCATION
's3://xxxxxx/data/RS/output/6m_plain/2014-01-2014-05/';
可能有一些更简单的方法。