我的count(*)作业运行了大约50秒,只是要报告该Hive表中有5k条记录。
INFO : Ended Job = job_1537244839121_123016
INFO : MapReduce Jobs Launched:
INFO : Stage-Stage-1: Map: 7 Reduce: 1 Cumulative CPU: 60.8 sec HDFS Read: 2022641 HDFS Write: 104 SUCCESS
INFO : Total MapReduce CPU Time Spent: 1 minutes 0 seconds 800 msec
INFO : Completed executing command(queryId=hive_20180927135454_6de461ea-c02c-4229-b225-525244da7a8c); Time taken: 48.972 seconds
INFO : OK
+-------+--+
| _c0 |
+-------+--+
| 5628 |
+-------+--+
1 row selected (49.507 seconds)
是否有一种方法可以扫描Parquet文件并使用Hadoop中的其他方法更快地返回此答案?