我在Impala中有一张表,如下所示:
product..Pgroup...testtype...result
....A.... 1.....length......2.0mm
....B.....1.....length......4.0mm
....C.....1.....weight......3.0gr
....D.....1.....weight......1.0gr
....E.....2.....weight......2.0gr
....F.....2.....weight......2.0gr
我希望通过Pgroup计算每个测试类型和组的平均值。结果我希望看起来像这样:
Pgroup....testtype...averageresult
1.........length.....3.0mm
1.........weigth.....2.0gr
2.........weigth.....2.0gr
你能帮忙吗?
create table test_1 (product string, pgroup string, testtype string, result string);
insert into test_1 values ('A', '1', 'length','2.0mm'),
('B', '1', 'length','4.0mm'),
('C', '1', 'weight','3.0gr'),
('D', '1', 'weight','1.0gr'),
('E', '2', 'weight','2.0gr'),
('F', '2', 'weight','2.0gr')
答案 0 :(得分:0)
一个hacky解决方案
select pgroup, testtype, concat(cast(avg(cast(regexp_extract(result,'[0-9\.]+', 0) as double)) as string),
case testtype when 'length' then 'mm' when 'weight' then 'gr' end)
from test_1 group by pgroup, testtype;
虽然我绝对建议创建一个预处理表,其值和单位分为两列。
答案 1 :(得分:0)
我建议在下面
SELECT pgroup, testtype
, CONCAT(AVG( CAST( SUBSTRING( result, 1, LENGTH(result)-2) AS DOUBLE)),
CASE testtype WHEN 'length' THEN 'mm' WHEN 'weight' THEN 'gr' END)
FROM test_1
GROUP BY pgroup, testtype;
正如其他人所建议的那样,更好的方法是为结果和测量创建一个单独的列。