我的AWS Kinesis Analytics功能需要一些帮助。
我有一个包含这些数据的流:
hubId (Integer)
datetime (timestamp)
fid (varchar)
path (varchar)
我想将这些数据聚合到另一个流中,计算每小时的行数(网页浏览量)和每小时不同的fid数(访问者数),按hubId分组。
目标流:
profilesite_id(Integer) = hubId from source stream
datetime (timestamp)
visitors (Integer)
pageviews (Integer)
所以在MySQL中,我的功能是这样的:
SELECT hubId, CONCAT_WS(':', SUBSTR(datetime, 1, 13), '00:00') datetime, COUNT(*) pageviews, COUNT(DISTINCT(fid)) visitors
FROM tableStream
WHERE timestamp >= CURDATE()
GROUP BY hubId, CONCAT_WS(':', SUBSTR(datetime, 1, 13), '00:00');
我尝试将此请求转换为Kinesis Analytics,但这很难(我第一次......抱歉:))。
目前我有这个Kinesis Analytics功能:
CREATE OR REPLACE STREAM "bore_agg" (profilsite_id SMALLINT, datetime TIMESTAMP, visitors INT, pageviews INT);
-- Create pump to insert into output
CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "bore_agg"
-- Select all columns from source stream
SELECT
SOURCE_SQL_STREAM_001."hubId" profilsite_id,
CHAR_TO_TIMESTAMP('yyyy-MM-DD hh:mm:ss', TIMESTAMP_TO_CHAR('YYYY-MM-DD HH:00:00', SOURCE_SQL_STREAM_001."datetime")) datetime,
COUNT(DISTINCT(SOURCE_SQL_STREAM_001."fid")) visitors,
COUNT(*) pageviews
FROM SOURCE_SQL_STREAM_001
WHERE SOURCE_SQL_STREAM_001."datetime" >= CHAR_TO_TIMESTAMP('yyyy-MM-DD hh:mm:ss', TIME_TO_CHAR('YYYY-MM-DD HH:00:00', CURRENT_TIME))
GROUP BY
SOURCE_SQL_STREAM_001."hubId",
CHAR_TO_TIMESTAMP('yyyy-MM-DD hh:mm:ss', TIMESTAMP_TO_CHAR('YYYY-MM-DD HH:00:00', SOURCE_SQL_STREAM_001."datetime"));
但我有这个错误,我真的不明白该怎么做:
您的SQL代码出错。更新您的问题时出现问题 应用。错误消息:失败的SQL命令:CREATE OR REPLACE PUMP “STREAM_PUMP”AS INSERT INTO“bore_agg”SELECT SOURCE_SQL_STREAM_001。“hubId”profilsite_id, CHAR_TO_TIMESTAMP('yyyy-MM-DD hh:mm:ss',TIMESTAMP_TO_CHAR('YYYY-MM-DD) HH:00:00',SOURCE_SQL_STREAM_001。“datetime”))datetime, COUNT(DISTINCT(SOURCE_SQL_STREAM_001。“fid”))访客,COUNT(*) 网页浏览量来自SOURCE_SQL_STREAM_001 WHERE SOURCE_SQL_STREAM_001。“datetime”> = CHAR_TO_TIMESTAMP('yyyy-MM-DD hh:mm:ss',TIME_TO_CHAR('YYYY-MM-DD HH:00:00',CURRENT_TIME))GROUP BY SOURCE_SQL_STREAM_001。“hubId”,CHAR_TO_TIMESTAMP('yyyy-MM-DD hh:mm:ss',TIMESTAMP_TO_CHAR('YYYY-MM-DD HH:00:00', SOURCE_SQL_STREAM_001。 “日期时间”))。 SQL错误消息:从第9行开始, 第1列到第11行,第120列:无法聚合无限流: GROUP BY子句未指定或不包含任何单调 表达式。
有人能把我推向正确的方向吗?
提前致谢:)
托马斯
答案 0 :(得分:0)
我知道已经有一段时间了,因为亚当在这里(唯一的)回应。因此,以防万一,它可以帮助某人,正如Adam指出的那样,如果您考虑一下,Data Analytics流可以是“无限”的输入。因此,您需要告诉停在哪里;即“汇总流中最后一分钟或一小时的数据”。因此,在此示例中(下面的代码),它将聚合流的传入数据,直到指定的分钟或小时为止。
注意:请记住,首先需要创建一个具有相同结构(返回的列数和数据类型)的STREAM,然后通过创建一个PUMP将“ INSERT-SELECT”运行到新流中,是扫描传入数据并返回结果的过程(将在第一步中插入到STREAM中)。
示例:
-- ** Aggregate (COUNT, AVG, etc.) + Tumbling Time Window **
-- Performs function on the aggregate rows over a 10 second tumbling window for a specified column.
-- .----------. .----------. .----------.
-- | SOURCE | | INSERT | | DESTIN. |
-- Source-->| STREAM |-->| & SELECT |-->| STREAM |-->Destination
-- | | | (PUMP) | | |
-- '----------' '----------' '----------'
-- STREAM (in-application): a continuously updated entity that you can SELECT from and INSERT into like a TABLE
-- PUMP: an entity used to continuously 'SELECT ... FROM' a source STREAM, and INSERT SQL results into an output STREAM
-- Create output stream, which can be used to send to a destination
CREATE OR REPLACE STREAM DESTINATION_SQL_STREAM (ingest_time TIMESTAMP, vendorid int, count_vs_time int);
CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"
-- Query 1):
-- Group by VendorID over the last 60 seconds of the stream.
SELECT STREAM STEP("SOURCE_SQL_STREAM_001".ROWTIME BY INTERVAL '60' SECOND) AS ingest_time, "vendorid", COUNT(*)
FROM "SOURCE_SQL_STREAM_001"
GROUP BY STEP("SOURCE_SQL_STREAM_001".ROWTIME BY INTERVAL '60' SECOND), "vendorid";
--Query 2)
-- Group by VendorID and count, over the last hour of the stream.
CREATE OR REPLACE STREAM DESTINATION_SQL_STREAM (hour_range TIMESTAMP, vendorid int, count_last_hr int);
CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"
SELECT STREAM FLOOR("SOURCE_SQL_STREAM_001".ROWTIME TO HOUR) AS hour_range, "vendorid", COUNT(*) as count_last_hr
FROM "SOURCE_SQL_STREAM_001"
GROUP BY FLOOR("SOURCE_SQL_STREAM_001".ROWTIME TO HOUR), "vendorid";
HTH。
卡洛斯。