我的代码:
SELECT * FROM (SELECT ROW_NUMBER() over(ORDER BY sn desc,sn ) as Row,
sn as "sn",rsl_name as "rslName",mec_name as "mecName",app_name as "appName",app_ver as "appVer"
FROM
hive.bps.parameter_info) T
where T.Row between 1 and 20;
语句耗时10多秒,想2秒内拿到数据; 这是分析:
presto> explain analyze SELECT * FROM (SELECT ROW_NUMBER() over(ORDER BY sn desc,sn ) as Row, sn as "sn",rsl_name as "rslName",mec_name as "mecName",app_name as "appName",app_ver as "appVer" FROM hive.bps.parameter_info) T where T.Row between 1 and 20;
Query Plan
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Fragment 1 [SINGLE]
CPU: 92.20ms, Scheduled: 211.99ms, Input: 21980 rows (2.16MB); per task: avg.: 21980.00 std.dev.: 0.00, Output: 20 rows (2.07kB)
Output layout: [rsl_name, mec_name, app_name, app_ver, sn, row_number]
Output partitioning: SINGLE []
Stage Execution Strategy: UNGROUPED_EXECUTION
- Filter[filterPredicate = row_number BETWEEN (BIGINT 1) AND (BIGINT 20)] => [rsl_name:varchar, mec_name:varchar, app_name:varchar, app_ver:varchar, sn:varchar, row_number:bigint]
CPU: 0.00ns (0.00%), Scheduled: 1.00ms (0.00%), Output: 20 rows (2.07kB)
Input avg.: 1.25 rows, Input std.dev.: 387.30%
- LocalExchange[ROUND_ROBIN] () => [rsl_name:varchar, mec_name:varchar, app_name:varchar, app_ver:varchar, sn:varchar, row_number:bigint]
CPU: 0.00ns (0.00%), Scheduled: 0.00ns (0.00%), Output: 20 rows (2.07kB)
Input avg.: 20.00 rows, Input std.dev.: 0.00%
- TopNRowNumber[partition by (), order by (sn DESC_NULLS_LAST) limit 20] => [rsl_name:varchar, mec_name:varchar, app_name:varchar, app_ver:varchar, sn:varchar, row_number:bigint]
CPU: 14.00ms (0.05%), Scheduled: 49.00ms (0.06%), Output: 20 rows (2.07kB)
Input avg.: 21980.00 rows, Input std.dev.: 0.00%
row_number := row_number()
- LocalExchange[SINGLE] () => [rsl_name:varchar, mec_name:varchar, app_name:varchar, app_ver:varchar, sn:varchar]
CPU: 17.00ms (0.06%), Scheduled: 71.00ms (0.09%), Output: 21980 rows (2.16MB)
Input avg.: 1373.75 rows, Input std.dev.: 329.25%
- RemoteSource[2] => [rsl_name:varchar, mec_name:varchar, app_name:varchar, app_ver:varchar, sn:varchar]
CPU: 22.00ms (0.08%), Scheduled: 29.00ms (0.04%), Output: 21980 rows (2.16MB)
Input avg.: 1373.75 rows, Input std.dev.: 329.25%
Fragment 2 [SOURCE]
CPU: 27.16s, Scheduled: 1.19m, Input: 829722 rows (54.99MB); per task: avg.: 414861.00 std.dev.: 867.00, Output: 21980 rows (2.16MB)
Output layout: [rsl_name, mec_name, app_name, app_ver, sn]
Output partitioning: SINGLE []
Stage Execution Strategy: UNGROUPED_EXECUTION
- TopNRowNumber[partition by (), order by (sn DESC_NULLS_LAST) limit 20] => [rsl_name:varchar, mec_name:varchar, app_name:varchar, app_ver:varchar, sn:varchar]
CPU: 681.00ms (2.51%), Scheduled: 1.25s (1.51%), Output: 21980 rows (2.16MB)
Input avg.: 754.98 rows, Input std.dev.: 3.78%
row_number := row_number()
- TableScan[TableHandle {connectorId='hive', connectorHandle='HiveTableHandle{schemaName=bps, tableName=parameter_info, analyzePartitionValues=Optional.empty}', layout='Optional[bps.parameter_info{}]'}, gro
CPU: 26.45s (97.30%), Scheduled: 1.35m (98.30%), Output: 829722 rows (54.99MB)
Input avg.: 754.98 rows, Input std.dev.: 3.78%
LAYOUT: bps.parameter_info{}
rsl_name := rsl_name:string:2:REGULAR
mec_name := mec_name:string:3:REGULAR
sn := sn:string:8:REGULAR
app_name := app_name:string:4:REGULAR
app_ver := app_ver:string:5:REGULAR
Input: 829722 rows (54.99MB), Filtered: 0.00%
其他: 表为 829722 行,6000 列;存储为兽人;
当我提取几个重要的列创建一个新表,然后使用presto查询,速度更快,ORC按列存储不行吗?
我该怎么做才能加快速度?