我有一些我想要聚合的数据(在这里大大简化)。原始数据使用类似于以下的模式:
UserID - STRING
A - RECORD REPEATED
A.Action - STRING
A.Visit - INTEGER
A.Order - INTEGER
MISC - RECORD REPEATED
( other columns omitted here )
由于“MISC”列,有许多实际记录,但我只是关注上面显示的前5列。原始数据的示例如下所示(请注意,显示的值仅为示例,存在许多其他值,因此无法将这些值硬编码到查询中):
表0 :(原始数据样本)
(UserID下的空值如BiqQuery中所示 - “A”字段是嵌套记录的一部分)
我的查询生成下面表1 中显示的数据。我正在尝试使用带有ORDINAL的ARRAY_AGG来为每个用户选择前两个“Action”并进行重组,如表2所示。
SELECT
UserId, ARRAY_AGG( STRUCT(A.Action, A.Visit, A.Order)
ORDER BY A.Visit, A.Order, A.Action )
FROM
`table`
LEFT JOIN UNNEST(A) AS A
GROUP BY
UserId
表1 :(上述查询的示例输出)
表2 :(所需格式)
所以我需要:
我尝试的查询策略是使用以下内容对ORID BY UserID,Visit,Order和获取Action的DISTINCT值:
UserId,
ARRAY_AGG(DISTINCT Action ORDER BY UserID, Visit, Order) FirstAction,
ARRAY_AGG(DISTINCT Action ORDER BY UserID, Visit, Order) SecondAction
但是,该方法会产生以下错误:
错误:同时具有DISTINCT和ORDER BY参数的聚合函数只能ORDER BY作为函数参数的列
有关如何纠正此错误(或替代方法?)的任何想法
答案 0 :(得分:3)
如果表2中显示的结果不需要重复数据删除,则不确定原始查询为何具有DISTINCT
。
随着说:
#standardSQL
WITH sample AS (
SELECT actor.login userid, type action
, EXTRACT(HOUR FROM created_at) visit
, EXTRACT(MINUTE FROM created_at) `order`
FROM `githubarchive.day.20171005`
)
SELECT userid, actions[OFFSET(0)] firstaction, actions[SAFE_OFFSET(1)] secondaction
FROM (
SELECT userid, ARRAY_AGG(action ORDER BY visit, `order` LIMIT 2) actions
FROM sample
GROUP BY 1
ORDER BY 1
LIMIT 100
)
答案 1 :(得分:1)
试试以下。
#standardSQL
SELECT UserId,
ARRAY_AGG(Action ORDER BY Visit, `Order`, Action LIMIT 2)[SAFE_ORDINAL(1)] AS FirstAction,
ARRAY_AGG(Action ORDER BY Visit, `Order`, Action LIMIT 2)[SAFE_ORDINAL(2)] AS SecondAction
FROM `project.dataset.table`
LEFT JOIN UNNEST(A) AS A
GROUP BY UserId
-- ORDER BY UserId
您可以使用问题中的虚拟数据进行测试/播放
#standardSQL
WITH `table` AS (
SELECT 'U001' AS UserId, [STRUCT<Action STRING, Visit INT64, `Order` INT64 >
('Register', 1, 1),('Upgrade', 1, 2),('Feedback', 1, 3),('Share', 1, 4),('Share', 2, 1)] AS A UNION ALL
SELECT 'U002', [STRUCT<Action STRING, Visit INT64, `Order` INT64 >
('Share', 7, 1),('Share', 7, 2),('Refer', 8, 1),('Feedback', 8, 2),('Feedback', 8, 3)] UNION ALL
SELECT 'U003', [STRUCT<Action STRING, Visit INT64, `Order` INT64 >
('Register', 1, 1),('Share', 1, 2),('Share', 1, 3),('Share', 2, 1),('Share', 2, 2),('Share', 3, 1),('Share', 3, 2)]
)
SELECT UserId,
ARRAY_AGG(Action ORDER BY Visit, `Order`, Action LIMIT 2)[SAFE_ORDINAL(1)] AS FirstAction,
ARRAY_AGG(Action ORDER BY Visit, `Order`, Action LIMIT 2)[SAFE_ORDINAL(2)] AS SecondAction
FROM `table`
LEFT JOIN UNNEST(A) AS A
GROUP BY UserId
ORDER BY UserId