Hive查询GROUP BY错误;无效的表别名或列引用

时间:2014-06-24 14:07:00

标签: hadoop hive hql

Kindest,

我正在尝试扩展一些有效的HIVE查询,但似乎不尽如人意。只是想测试GROUP BY功能,这是我需要完成的许多查询所共有的。这是我试图执行的查询:

DROP table CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary;

CREATE EXTERNAL TABLE IF NOT EXISTS CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary ( messageRowID STRING, payload_sensor INT, messagetimestamp BIGINT, payload_temp FLOAT, payload_timestamp BIGINT, payload_timestampmysql STRING, payload_watt INT, payload_wattseconds INT ) 
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES ( "cassandra.host" = "127.0.0.1",
    "cassandra.port" = "9160",
    "cassandra.ks.name" = "EVENT_KS",
    "cassandra.ks.username" = "admin",
    "cassandra.ks.password" = "admin",
    "cassandra.cf.name" = "currentcost_stream",
    "cassandra.columns.mapping" = ":key, payload_sensor, Timestamp, payload_temp, payload_timestamp, payload_timestampmysql, payload_watt, payload_wattseconds" );

select messageRowID, payload_sensor, messagetimestamp, payload_temp, payload_timestamp, payload_timestampmysql, payload_watt, payload_wattseconds, hour(from_unixtime(payload_timestamp)) AS hourly
FROM CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary 
WHERE payload_timestamp > unix_timestamp() - 3024*60*60
GROUP BY hourly;

这会产生以下错误:

  

错误:执行Hive脚本时出错。查询返回非零代码:   10,原因:FAILED:语义分析错误:第1行:320无效   表别名或列引用'每小时' :(可能的列名是:   messagerowid,payload_sensor,messagetimestamp,payload_temp,   payload_timestamp,payload_timestampmysql,payload_watt,   payload_wattseconds)

目的是最终得到一个时间限制的查询(比如说最后24小时),在REST()上填写payload_wattsecond等。为了开始打破然后创建摘要表,我开始按查询构建一个组将导出select查询的每小时锚点。

问题虽然是上面的错误。非常感谢任何有关此处错误的指示..似乎无法自己找到它,但我再次成为HIVE的新手。

提前致谢..

更新:尝试更新查询。这是我刚刚尝试运行的查询:

DROP table CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary;

CREATE EXTERNAL TABLE IF NOT EXISTS CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary ( messageRowID STRING, payload_sensor INT, messagetimestamp BIGINT, payload_temp FLOAT, payload_timestamp BIGINT, payload_timestampmysql STRING, payload_watt INT, payload_wattseconds INT ) 
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES ( "cassandra.host" = "127.0.0.1",
    "cassandra.port" = "9160",
    "cassandra.ks.name" = "EVENT_KS",
    "cassandra.ks.username" = "admin",
    "cassandra.ks.password" = "admin",
    "cassandra.cf.name" = "currentcost_stream",
    "cassandra.columns.mapping" = ":key, payload_sensor, Timestamp, payload_temp, payload_timestamp, payload_timestampmysql, payload_watt, payload_wattseconds" );

select messageRowID, payload_sensor, messagetimestamp, payload_temp, payload_timestamp, payload_timestampmysql, payload_watt, payload_wattseconds, hour(from_unixtime(payload_timestamp))
FROM CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary 
WHERE payload_timestamp > unix_timestamp() - 3024*60*60
GROUP BY hour(from_unixtime(payload_timestamp)); 

..但是会产生另一个错误,即:

ERROR: Error while executing Hive script.Query returned non-zero code: 10, cause: FAILED: Error in semantic analysis: Line 1:7 Expression not in GROUP BY key 'messageRowID'

思想?

更新#2)以下是一些快速转储,这些样本派生到WSO2BAM中的EVENT_KS CF中。最后一列是计算的(在perl守护进程中......)#watt_seconds,它将在查询中用于计算总计为kwH的总和,然后将其转储到MySQL表中以同步到保存ui的应用程序/ ux layer ..

[12:03:00] [jskogsta@enterprise ../Product Centric Opco Modelling]$ ~/local/apache-cassandra-2.0.8/bin/cqlsh localhost 9160 -u admin -p admin --cqlversion="3.0.5"
Connected to Test Cluster at localhost:9160.
[cqlsh 4.1.1 | Cassandra 1.2.13 | CQL spec 3.0.5 | Thrift protocol 19.36.2]
Use HELP for help.
cqlsh> use "EVENT_KS";
cqlsh:EVENT_KS> select * from currentcost_stream limit 5;

 key                                       | Description               | Name               | Nick_Name            | StreamId                  | Timestamp     | Version | payload_sensor | payload_temp | payload_timestamp | payload_timestampmysql | payload_watt | payload_wattseconds
-------------------------------------------+---------------------------+--------------------+----------------------+---------------------------+---------------+---------+----------------+--------------+-------------------+------------------------+--------------+---------------------
  1403365575174::10.11.205.218::9443::9919 | Sample data from CC meter | currentcost.stream | Currentcost Realtime | currentcost.stream:1.0.18 | 1403365575174 |  1.0.18 |              1 |        13.16 |        1403365575 |    2014-06-21 23:46:15 |         6631 |               19893
  1403354553932::10.11.205.218::9443::2663 | Sample data from CC meter | currentcost.stream | Currentcost Realtime | currentcost.stream:1.0.18 | 1403354553932 |  1.0.18 |              1 |         14.1 |        1403354553 |    2014-06-21 20:42:33 |        28475 |                   0
 1403374113341::10.11.205.218::9443::11852 | Sample data from CC meter | currentcost.stream | Currentcost Realtime | currentcost.stream:1.0.18 | 1403374113341 |  1.0.18 |              1 |        10.18 |        1403374113 |    2014-06-22 02:08:33 |        17188 |              154692
  1403354501924::10.11.205.218::9443::1894 | Sample data from CC meter | currentcost.stream | Currentcost Realtime | currentcost.stream:1.0.18 | 1403354501924 |  1.0.18 |              1 |        10.17 |        1403354501 |    2014-06-21 20:41:41 |        26266 |                   0
 1403407054092::10.11.205.218::9443::15527 | Sample data from CC meter | currentcost.stream | Currentcost Realtime | currentcost.stream:1.0.18 | 1403407054092 |  1.0.18 |              1 |        17.16 |        1403407054 |    2014-06-22 11:17:34 |         6332 |                6332

(5 rows)

cqlsh:EVENT_KS>

我将尝试做的是针对此表发出查询(实际上是根据所需的各种表示聚合倍数...),并提供基于每小时总和的视图,10分钟总和,每日金额,每月金额等等。根据查询,GROUP BY旨在给出这个'索引'可以这么说。现在只是测试这个...所以将看到它最终如何结束。希望这有意义吗?!

所以不要试图删除重复...

更新3)这一切都是错误的......并且在下面给出的提示上再多想一想。因此,只是简化整个查询给出了正确的结果。以下查询为WHOLE数据集每小时提供kwH的总量。有了这个,我可以创建各种时间段(如

)花费的各种kwH迭代次数
  • 过去24小时内每小时
  • 去年每日
  • 过去一小时的分钟

..等等。

以下是查询:

DROP table CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary;

CREATE EXTERNAL TABLE IF NOT EXISTS CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary ( messageRowID STRING, payload_sensor INT, messagetimestamp BIGINT, payload_temp FLOAT, payload_timestamp BIGINT, payload_timestampmysql STRING, payload_watt INT, payload_wattseconds INT ) 
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES ( "cassandra.host" = "127.0.0.1",
    "cassandra.port" = "9160",
    "cassandra.ks.name" = "EVENT_KS",
    "cassandra.ks.username" = "admin",
    "cassandra.ks.password" = "admin",
    "cassandra.cf.name" = "currentcost_stream",
    "cassandra.columns.mapping" = ":key, payload_sensor, Timestamp, payload_temp, payload_timestamp, payload_timestampmysql, payload_watt, payload_wattseconds" );

select hour(from_unixtime(payload_timestamp)) AS hourly, (sum(payload_wattseconds)/(60*60)/1000)
FROM CurrentCostDataSamples_MySQL_Dump_Last_1_Hour_Summary 
GROUP BY hour(from_unixtime(payload_timestamp));

此查询根据示例数据生成以下内容:

hourly  _c1
0   16.91570472222222
1   16.363228888888887
2   15.446414166666667
3   11.151388055555556
4   18.10564666666667
5   2.2734924999999997
6   17.370668055555555
7   17.991484444444446
8   38.632728888888884
9   16.001440555555554
10  15.887023888888889
11  12.709341944444445
12  23.052629722222225
13  14.986092222222222
14  16.182284722222224
15  5.881564999999999
18  2.8149172222222223
19  17.484405
20  15.888274166666665
21  15.387210833333333
22  16.088641666666668
23  16.49990916666667

这是整个数据集的每小时时间框架的总kwH ..

所以,现在谈到下一个问题。 ;-)

0 个答案:

没有答案