我在python中使用Pig UDF从HBase表读取数据然后处理并解析它,最后将其插入到另一个HBase表中。但我面临一些问题。
但是当我尝试将这些新值插入另一个HBase表时,我遇到了两个问题:
这是我的猪脚本的片段:
register 'parser.py' using org.apache.pig.scripting.jython.JythonScriptEngine as myParser;
data = LOAD 'hbase://source' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('values:*', '-loadKey true') AS
(rowkey:chararray, curves:map[]);
data_1 = FOREACH data GENERATE myParser.table_to_xml(rowkey,curves);
STORE data_1 INTO 'hbase://destination' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('a_column_family:curves:*');
我的Python脚本作为输入(rowkey,curves)
@outputSchema('tuple:(rowkey:chararray, curves:chararray)')
def table_to_xml(rowkey,curves):
key = some_processing_which_is_correct
for k in curves:
content = some_processing_which_is_also_correct
return (key,content)