我有一个查询来计算表中的行数。它在关闭矢量化时起作用,否则不起作用。
0: jdbc:hive2://localhost:10000> set hive.vectorized.execution.enabled=true;
No rows affected (0.002 seconds)
0: jdbc:hive2://localhost:10000> select count(fileID) from myTable where year=2007;
Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)
0: jdbc:hive2://localhost:10000> set hive.vectorized.execution.enabled=false;
No rows affected (0.002 seconds)
0: jdbc:hive2://localhost:10000> select count(fileID) from myTable where year=2007;
+----------+--+
| _c0 |
+----------+--+
| 1334706 |
+----------+--+
1 row selected (26.286 seconds)
注意,fileID的数据类型是INT。
如果我打开矢量化并查看EXPLAIN,则信息如下:
0: jdbc:hive2://localhost:10000> set hive.vectorized.execution.enabled=true;
No rows affected (0.003 seconds)
0: jdbc:hive2://localhost:10000> explain select count(fileID) from myTable where year=2007;
+----------------------------------------------------------------------------------------------------------+--+
| Explain |
+----------------------------------------------------------------------------------------------------------+--+
| STAGE DEPENDENCIES: |
| Stage-1 is a root stage |
| Stage-0 depends on stages: Stage-1 |
| |
| STAGE PLANS: |
| Stage: Stage-1 |
| Map Reduce |
| Map Operator Tree: |
| TableScan |
| alias: bcm40 |
| Statistics: Num rows: 1334706 Data size: 3784110 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| expressions: fileid (type: int) |
| outputColumnNames: fileid |
| Statistics: Num rows: 1334706 Data size: 3784110 Basic stats: COMPLETE Column stats: NONE |
| Group By Operator |
| aggregations: count(fileid) |
| mode: hash |
| outputColumnNames: _col0 |
| Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE |
| Reduce Output Operator |
| sort order: |
| Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE |
| value expressions: _col0 (type: bigint) |
| Execution mode: vectorized |
| Reduce Operator Tree: |
| Group By Operator |
| aggregations: count(VALUE._col0) |
| mode: mergepartial |
| outputColumnNames: _col0 |
| Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| expressions: _col0 (type: bigint) |
| outputColumnNames: _col0 |
| Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE |
| File Output Operator |
| compressed: false |
| Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE |
| table: |
| input format: org.apache.hadoop.mapred.TextInputFormat |
| output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
| serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| |
| Stage: Stage-0 |
| Fetch Operator |
| limit: -1 |
| Processor Tree: |
| ListSink |
| |
+----------------------------------------------------------------------------------------------------------+--+
有人可以解释这种情况发生的方式吗?