我试图使用ANALYZE TABLE table_name分区(partition-spec =)COMPUTE STATISTICS FOR COLUMNS命令。
我无法理解表中提供的结果与针对同一表,列和分区的select语句中的计算之间的区别。
例如,给定特定分区,我将“计算统计信息”应用于“列”,然后使用“描述格式” table_name.var_name PARTITION(partition-spec =)看到以下结果:
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+ | col_name | data_type | min | max | num_nulls | distinct_count | avg_col_len | max_col_len | num_trues | num_falses | comment | +-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+ | # col_name | data_type | min | max | num_nulls | distinct_count | avg_col_len | max_col_len | num_trues | num_falses | comment | | | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | | sub_id | bigint | 100000000003631773 | 112330000086219636 | 0 | 403024 | | | | | from deserializer | +-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
但是使用table_name中的SELECT COUNT(DISTINCT SUB_ID),其中partition = yyyymmdd,我得到以下结果:
+---------+--+ | qid | +---------+--+ | 465001 | +---------+--+
有人知道为什么在结果中出现这种差异吗?
谢谢!