HIVE多组分别依次操作

时间:2016-04-25 07:37:49

标签: hadoop hive

我正在尝试在id和playerID以及年份和年份ID(下面的架构)上加入这两个select语句。还减去别名HAB - EG还在两个select语句中按年份和id进行分组,以便在稍后的层次结构中执行除法和减法之前对值进行求和。当我尝试这个时,它说由G组成,看起来很奇怪。我不需要按G分组,只需要id和year,因为玩家可以在表格中有多个条目,我们需要在计算之前总结G,E H和AB

  Try this:

SELECT
    a.playerID AS ID,
    a.yearID AS yearID,
    (b.HAB - a.EG) AS `HAB-EG`
FROM 
    (SELECT
        SUM(playerID),
        SUM(yearID),
        (E/G) AS EG
    FROM fielding
    WHERE (
            yearID > 2005
            AND yearID < 2009
            AND G > 20 
            )GROUP BY playerID,yearID
    ) AS a
JOIN
    (SELECT
        SUM(id),
        SUM(year),
        (hits/ab) AS HAB
    FROM batting
    WHERE( 
            year > 2005
            AND year < 2009 
            AND ab > 40 
            ) GROUP BY id,year

    ) AS b ON a.playerID = b.id AND a.yearID = b.year;

JUST SCHEMA

CREATE EXTERNAL TABLE IF NOT EXISTS fielding
(playerID STRING ,yearID INT ,teamID STRING ,lgID STRING ,
POS STRING ,G INT ,GS INT , InnOuts INT , PO INT,A INT, E INT,  
DP INT , PB INT , WP INT ,SB INT ,CS INT , ZR INT ) ROW
FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION      '/home/hduser/hivetest/fielding';

JUST THE SCHEMA

 CREATE EXTERNAL TABLE IF NOT EXISTS batting(id STRING, year INT, team STRING,
 league STRING, games INT, ab INT, runs INT, hits INT, doubles INT, triples
 INT, homeruns INT, rbi INT, sb INT, cs INT, walks INT, strikeouts INT, ibb
 INT, hbp INT, sh INT, sf INT, gidp INT) ROW FORMAT DELIMITED FIELDS
 TERMINATED BY ',' LOCATION '/home/hduser/hivetest/batting';

1 个答案:

答案 0 :(得分:0)

试试这个:

SELECT
    a.playerID AS ID,
    a.yearID AS yearID,
    (b.HAB - a.EG) AS `HAB-EG`
FROM 
    (SELECT
        playerID,
        yearID,
        (SUM(E)/SUM(G)) AS EG
    FROM fielding
    WHERE (
            yearID > 2005
            AND yearID < 2009
            AND G > 20 
            )GROUP BY playerID,yearID
    ) AS a
JOIN
    (SELECT
        id,
        year,
        (SUM(hits)/SUM(ab)) AS HAB
    FROM batting
    WHERE( 
            year > 2005
            AND year < 2009 
            AND ab > 40 
            ) GROUP BY id,year

    ) AS b ON a.playerID = b.id AND a.yearID = b.year;