我想插入一个从源表到目标表的转换而不转储内容或创建新表,视图等。 所以我开始考虑从原始表中流式传输内容,动态修改它并写入目标表:
INSERT OVERWRITE TABLE d SELECT TRANSFORM item USING 'python po.py' AS (item map<string,string>) FROM s;
其中d定义为
CREATE TABLE d (item map<string, string>)
和s定义为
CREATE TABLE s (item map<string, string>)
我应该从python脚本打印什么来正确地将数据转换并加载到表d?
我尝试从python脚本中打印不同的表示,但似乎生成的项目总是会导致格式错误:
这样的事情:
{"item":{"representation":null}}
答案 0 :(得分:1)
您可以使用 str_to_map 返回特定格式的字符串并将其投射到地图。 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
<强>的bash 强>
cat>/tmp/myscript.sh
sed -r -e 's/\{(.*)\}/\1/' -e 's/"//g' -e 's/v(.)/v\100/g'
<强>蜂房强>
create table d (item map<string,string>);
create table s (item map<string,string>);
insert into s select map('k1','v1','k2','v2','k3','v3');
add file /tmp/myscript.sh;
insert into d
select str_to_map (result)
from (select transform (item) using "myscript.sh" as result
from s
) t
;
select * from d
;
+---------------------------------------+
| d.item |
+---------------------------------------+
| {"k1":"v100","k2":"v200","k3":"v300"} |
+---------------------------------------+
......为了清楚起见:
select * from s;
+---------------------------------+
| s.item |
+---------------------------------+
| {"k1":"v1","k2":"v2","k3":"v3"} |
+---------------------------------+
select result
,str_to_map (result) result_to_map
from (select transform (item) using "myscript.sh" as result
from s
) t
;
+-------------------------+---------------------------------------+
| result | result_map |
+-------------------------+---------------------------------------+
| k1:v100,k2:v200,k3:v300 | {"k1":"v100","k2":"v200","k3":"v300"} |
+-------------------------+---------------------------------------+
hive> explain
> select str_to_map (result)
>
> from (select transform (item) using "myscript.sh" as result
> from s
> ) t
> ;
OK
Explain
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: s
Statistics: Num rows: 1 Data size: 17 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: item (type: map<string,string>)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 17 Basic stats: COMPLETE Column stats: NONE
Transform Operator
command: myscript.sh
output info:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Statistics: Num rows: 1 Data size: 17 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: str_to_map(_col0) (type: map<string,string>)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 17 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 17 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink