Question

我在HDP2群集上运行Hive 0.14。我的数据集是使用kite sdk构建的，并使用外部表注册到Hive。

请参阅下面的表格布局：

hive> select * from hivetweets limit 1;
OK
Time taken: 103.726 seconds, Fetched: 1 row(s)

我对此设置的初始测试查询只是获取数据集的一行（我删除了示例中的实际输出）：

hive> select count(*) from hivetweets limit 100000;
Query ID = root_20150715132222_81e386ef-2990-4251-a61f-82ca8da4c48d
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening...
Session re-established.


Status: Running (Executing on YARN cluster with App id application_1436910684121_0006)

--------------------------------------------------------------------------------

VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED     19         19        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 567.52 s
--------------------------------------------------------------------------------
OK
197371741

运行此查询需要104秒才能完成。

这可能没有分布式运行，因此我尝试使用更多数据进行测试：

window.onerror = function(error, script, line, column) {
    // code here
}

在10分钟内计算10万条记录是合理的方法。

我很满意任何建议如何调试它。

不可接受的慢速配置单元查询

0 个答案: