早些时候我问过Hive或Pig中的manipulating a data structure。我能够在SQL中得到答案,并从那里找到了Hive的答案。我还在寻找Pig的解决方案。
我试过了:
myTable2 = FOREACH myTable GENERATE item, year,
'jan' AS month, jan AS value,
'feb' AS month, feb AS value,
'mar' AS month, mar AS value;
或多或少在Hive中有效,但Pig给了我:
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1108:
<line 2, column 35> Duplicate schema alias: month
答案 0 :(得分:0)
我想通了,虽然我很想看到更简洁的版本:
JAN = FOREACH myTable GENERATE item, year, 'jan' AS month, jan AS value;
FEB = FOREACH myTable GENERATE item, year, 'feb' AS month, feb AS value;
MAR = FOREACH myTable GENERATE item, year, 'mar' AS month, mar AS value;
myTable2 = union JAN, FEB, MAR;
答案 1 :(得分:0)
猪脚本:
data = LOAD '/pigsamples/sampledata' USING PigStorage(',')
AS (item:CHARARRAY, year:INT, jan:DOUBLE, feb:DOUBLE, mar:DOUBLE);
--concatenating month name to its value so that they won't get separated when i perform a flatten on the tuple.
concat_data = FOREACH data GENERATE item, year, CONCAT('jan:', (CHARARRAY)jan) AS jan,
CONCAT('feb:', (CHARARRAY)feb) AS feb, CONCAT('mar:', (CHARARRAY)mar) AS mar;
--convert the month (name,value) pairs to a bag and flatten them
flatten_values = FOREACH concat_data GENERATE item, year, FLATTEN (TOBAG (jan, feb, mar)) AS month_values;
--split the string based on the delimiter that we used above to concat
split_flatten_values = FOREACH flatten_values GENERATE item, year, FLATTEN (STRSPLIT (month_values, ':')) AS (month:CHARARRAY, value:CHARARRAY);