我最近设置了Hive。我创建了一个外部表来访问MongoDB中的数据库。现在,如果我运行像SELECT id FROM users LIMIT 1;
之类的查询,则执行命令大约需要18秒。即使LIMIT
设置为10,100,1000,10000也会花费相同的时间。日志包含如下内容:
2015-08-24 09:19:37,918 INFO [HiveServer2-Handler-Pool: Thread-29]: splitter.MongoCollectionSplitter (MongoCollectionSplitter.java:createSplitFromBounds(163)) - Created split: min=null, max= { "_id" : { "$oid" : "55cdbffaa9ad1735c531a362"}}
2015-08-24 09:19:37,918 INFO [HiveServer2-Handler-Pool: Thread-29]: splitter.MongoCollectionSplitter (MongoCollectionSplitter.java:createSplitFromBounds(163)) - Created split: min={ "_id" : { "$oid" : "55cdbffaa9ad1735c531a362"}}, max= { "_id" : { "$oid" : "55cdc000a9ad1735d5cb42ab"}}
2015-08-24 09:19:37,918 INFO [HiveServer2-Handler-Pool: Thread-29]: splitter.MongoCollectionSplitter (MongoCollectionSplitter.java:createSplitFromBounds(163)) - Created split: min={ "_id" : { "$oid" : "55cdc000a9ad1735d5cb42ab"}}, max= { "_id" : { "$oid" : "55cdc002a9ad1735d5cb56f9"}}
2015-08-24 09:19:37,918 INFO [HiveServer2-Handler-Pool: Thread-29]: splitter.MongoCollectionSplitter (MongoCollectionSplitter.java:createSplitFromBounds(163)) - Created split: min={ "_id" : { "$oid" : "55cdc002a9ad1735d5cb56f9"}}, max= { "_id" : { "$oid" : "55cdc008a9ad1735eaffb513"}}
2015-08-24 09:19:37,919 INFO [HiveServer2-Handler-Pool: Thread-29]: splitter.MongoCollectionSplitter (MongoCollectionSplitter.java:createSplitFromBounds(163)) - Created split: min={ "_id" : { "$oid" : "55cdc008a9ad1735eaffb513"}}, max= { "_id" : { "$oid" : "55cdc00ba9ad1735eaffc961"}}
2015-08-24 09:19:37,919 INFO [HiveServer2-Handler-Pool: Thread-29]: splitter.MongoCollectionSplitter (MongoCollectionSplitter.java:createSplitFromBounds(163)) - Created split: min={ "_id" : { "$oid" : "55cdc00ba9ad1735eaffc961"}}, max= { "_id" : { "$oid" : "55cdc012a9ad1735fab2a0dd"}}
2015-08-24 09:19:37,919 INFO [HiveServer2-Handler-Pool: Thread-29]: splitter.MongoCollectionSplitter (MongoCollectionSplitter.java:createSplitFromBounds(163)) - Created split: min={ "_id" : { "$oid" : "55cdc012a9ad1735fab2a0dd"}}, max= null
实际上中间有很多类似的行,我省略了它们。从日志中我只能猜测,即使我做limit 1
Hive从MongoDB获取整个集合,然后选择1来显示。有没有办法改变这个,所以当我limit 1
时,Hive只会获得1行?
答案 0 :(得分:0)
对于Hive表(对于外部表也可能是这样)如果您只使用LIMIT选择一个特定字段,那么LIMIT一个Map Reduce任务(或者您正在使用的任何执行引擎)启动,而如果您选择SELECT *不需要Map Reduce - >它的速度要快得多。这可能是缓慢的原因。