使用hive将python脚本作为reducer加载map数据类型列

时间:2013-03-27 08:34:33

标签: python amazon-s3 hive emr

在Hive表的其中一列中,我想存储键值对。 Hive的复杂数据类型映射支持该构造。

(这只是我想要做的一个玩具示例,我有更多的列 我想像这样压缩)

所以我创建了一个这样的表:

hive>DESCRIBE transaction_detailed;
OK
id STRING
time STRING
Time taken: 0.181 seconds

hive>DROP TABLE IF EXISTS transactions;
hive>CREATE EXTERNAL TABLE transactions(
    id STRING,
    time_map MAP<STRING, INT>
    )
partitioned by (dt string) 
row format delimited fields terminated by '\t' collection items terminated by ',' map keys terminated by ':' lines terminated by '\n' 
location 's3://my_loaction/transactions/';

然后我尝试使用代码中描述的reducer加载map列:结构 time_map的内容类似于:{“min”:time,“max”:time,“average”:time,“total”:time}

hive>FROM( FROM transaction_detailed 
MAP transaction_detailed.id, transaction_detailed.time
USING "python unity mapper -- splits the same thing out as it takes it"
AS id, time
cluster by id) transaction_time_map
insert overwrite table transactions partition(dt="2013-27-03")
REDUCE transaction_time_map.id, transaction_time_map.time
USING "python reducer which takes time_stamp sequence for a single id and summarizes them using min, max, average and total and supposed to insert into map"
as id, time_map;

但我收到这样的错误:

FAILED: Error in semantic analysis: Line 6:23 Cannot insert into target table because column number/types are different "two_day": Cannot convert column 8 from string to map<string,int>.

如何使用我的python reducer加载到地图列?

1 个答案:

答案 0 :(得分:0)

我认为上述问题的答案是在hive中使用str_to_map(text[, delimiter1, delimiter2])函数。