如何在Impala中计算这个平均值

时间:2017-11-30 19:56:36

标签: sql impala

我在Impala中有一张表,如下所示:

product..Pgroup...testtype...result 
....A.... 1.....length......2.0mm
....B.....1.....length......4.0mm
....C.....1.....weight......3.0gr
....D.....1.....weight......1.0gr
....E.....2.....weight......2.0gr
....F.....2.....weight......2.0gr

我希望通过Pgroup计算每个测试类型和组的平均值。结果我希望看起来像这样:

Pgroup....testtype...averageresult
1.........length.....3.0mm
1.........weigth.....2.0gr
2.........weigth.....2.0gr

你能帮忙吗?

create table test_1 (product string, pgroup string, testtype string, result string);
insert into test_1 values ('A', '1', 'length','2.0mm'),
('B', '1', 'length','4.0mm'),
('C', '1', 'weight','3.0gr'),
('D', '1', 'weight','1.0gr'),
('E', '2', 'weight','2.0gr'),
('F', '2', 'weight','2.0gr')

2 个答案:

答案 0 :(得分:0)

一个hacky解决方案

select pgroup, testtype, concat(cast(avg(cast(regexp_extract(result,'[0-9\.]+', 0) as double)) as string), 
                                case testtype when 'length' then 'mm' when 'weight' then 'gr' end) 
from test_1 group by pgroup, testtype;

虽然我绝对建议创建一个预处理表,其值和单位分为两列。

答案 1 :(得分:0)

我建议在下面

SELECT pgroup, testtype
    , CONCAT(AVG( CAST( SUBSTRING( result, 1, LENGTH(result)-2) AS DOUBLE)),
        CASE testtype WHEN 'length' THEN 'mm' WHEN 'weight' THEN 'gr' END)
FROM test_1 
GROUP BY pgroup, testtype;

正如其他人所建议的那样,更好的方法是为结果和测量创建一个单独的列。