Pig - 将复杂的关系模式存储在配置单元表中

时间:2017-08-10 00:04:41

标签: hadoop hive apache-pig transform

今天是我的交易。好吧,在从hive读取关系之后,我已经创建了一个关系作为几个转换的结果。问题是我想在Hive中进行一些分析之后存储最终的关系,但我不能。让我们在代码中看到很清楚。

第一个String是我从Hive加载并转换结果时:

july = LOAD 'POC.july' USING org.apache.hive.hcatalog.pig.HCatLoader ;  
july_cl = FOREACH july GENERATE GetDay(ToDate(start_date)) as day:int,start_station,duration; jul_cl_fl = FILTER july_cl BY day==31; 
july_gr = GROUP jul_cl_fl BY (day,start_station); 
july_result = FOREACH july_gr { 
           total_dura = SUM(jul_cl_fl.duration); 
           avg_dura = AVG(jul_cl_fl.duration); 
           qty_trips = COUNT(jul_cl_fl); 
           GENERATE FLATTEN(group),total_dura,avg_dura,qty_trips;
 };

所以,现在当我尝试存储关系july_result时,我不能因为模式已经改变而且我认为它与Hive不兼容:

STORE july_result INTO' poc.july_analysis'使用org.apache.hive.hcatalog.pig.HCatStorer();

即使我试图为最终关系设定一个特殊的方案,我还没有想到它。

july_result = FOREACH july_gr {
              total_dura = SUM(jul_cl_fl.duration);
              avg_dura = AVG(jul_cl_fl.duration);
              qty_trips = COUNT(jul_cl_fl);
              GENERATE FLATTEN(group) as (day:int),total_dura as (total_dura:int),avg_dura as (avg_dura:int),qty_trips as (qty_trips:int);
              };

1 个答案:

答案 0 :(得分:0)

在hortonworks社区进行研究后,我得到了关于如何为猪群关系定义输出格式的解决方案。我的新代码如下:

july_result = FOREACH july_gr {
              total_dura = SUM(jul_cl_fl.duration);
              avg_dura = AVG(jul_cl_fl.duration);
              qty_trips = COUNT(jul_cl_fl);
              GENERATE FLATTEN( group) AS (day, code_station),(int)total_dura as (total_dura:int),(float)avg_dura as (avg_dura:float),(int)qty_trips as (qty_trips:int);
              };

谢谢你们。