今天是我的交易。好吧,在从hive读取关系之后,我已经创建了一个关系作为几个转换的结果。问题是我想在Hive中进行一些分析之后存储最终的关系,但我不能。让我们在代码中看到很清楚。
第一个String是我从Hive加载并转换结果时:
july = LOAD 'POC.july' USING org.apache.hive.hcatalog.pig.HCatLoader ;
july_cl = FOREACH july GENERATE GetDay(ToDate(start_date)) as day:int,start_station,duration; jul_cl_fl = FILTER july_cl BY day==31;
july_gr = GROUP jul_cl_fl BY (day,start_station);
july_result = FOREACH july_gr {
total_dura = SUM(jul_cl_fl.duration);
avg_dura = AVG(jul_cl_fl.duration);
qty_trips = COUNT(jul_cl_fl);
GENERATE FLATTEN(group),total_dura,avg_dura,qty_trips;
};
所以,现在当我尝试存储关系july_result时,我不能因为模式已经改变而且我认为它与Hive不兼容:
STORE july_result INTO' poc.july_analysis'使用org.apache.hive.hcatalog.pig.HCatStorer();
即使我试图为最终关系设定一个特殊的方案,我还没有想到它。
july_result = FOREACH july_gr {
total_dura = SUM(jul_cl_fl.duration);
avg_dura = AVG(jul_cl_fl.duration);
qty_trips = COUNT(jul_cl_fl);
GENERATE FLATTEN(group) as (day:int),total_dura as (total_dura:int),avg_dura as (avg_dura:int),qty_trips as (qty_trips:int);
};
答案 0 :(得分:0)
在hortonworks社区进行研究后,我得到了关于如何为猪群关系定义输出格式的解决方案。我的新代码如下:
july_result = FOREACH july_gr {
total_dura = SUM(jul_cl_fl.duration);
avg_dura = AVG(jul_cl_fl.duration);
qty_trips = COUNT(jul_cl_fl);
GENERATE FLATTEN( group) AS (day, code_station),(int)total_dura as (total_dura:int),(float)avg_dura as (avg_dura:float),(int)qty_trips as (qty_trips:int);
};
谢谢你们。