我正在尝试在Snowflake中运行以下查询,但是它失败,并显示Unsupported subquery type cannot be evaluated
。该查询在其他SQL引擎(例如Postgresql和Presto)中有效,因此看来Snowflake不支持这种查询。
SELECT first_action.date,
DATEDIFF('day', first_action.date, returning_action.date) - 1 as diff,
APPROXIMATE_SIMILARITY(select MINHASH_COMBINE(value) from (select first_action.user_id_set as value union all select returning_action.user_id_set)) _set
FROM (select cast(_time as date) as date, minhash(100, _user) as user_id_set from events group by 1) as first_action
JOIN (select cast(_time as date) as date, minhash(100, _user) as user_id_set from events group by 1) as returning_action
ON (first_action.date < returning_action.date AND dateadd(day, 14, first_action.date) >= returning_action.date)
group by 1,2
该查询是使用MinHash的典型同类查询。我们每天计算MinHash,然后加入接下来的14天并合并结果,最后计算最终结果。
由于MinHash没有线性MINHASH_COMBINE函数,因此我们必须对UNION全部使用子查询,以使其正常工作,但这也没有解决。 :/
由于我们真的不知道任何解决方法,因此我们现在陷入困境。任何帮助表示赞赏!
答案 0 :(得分:1)
不确定是否可行,请尝试使用first_action
语句将returning_action
和WITH
分开:
WITH
first_action as (
SELECT
TRY_CAST(_time AS DATE) as date,
MINHASH(100, _user) as user_id_set
FROM events
GROUP BY 1
),
returning_action as (
SELECT
TRY_CAST(_time AS DATE) as date,
MINHASH(100, _user) as user_id_set
FROM events
GROUP BY 1
),
SELECT
first_action.date,
DATEDIFF('day', fa.date, ra.date) - 1 as diff,
APPROXIMATE_SIMILARITY(
SELECT MINHASH_COMBINE(value)
FROM (
SELECT fa.user_id_set AS value FROM first_action fa
UNION ALL
SELECT ra.user_id_set AS value FROM returning_action ra
)
) _set
FROM first_action fa
JOIN returning_action ra
ON (fa.date < ra.date AND DATEADD(day, 14, fa.date) >= ra.date)
GROUP BY 1,2
答案 1 :(得分:0)
So the major trick is that all these MINHASH_ functions window functions so you need to build a grouping key on the data.
So using this as my example data:
CREATE TABLE events(_user number, _time timestamp_ntz);
INSERT INTO events VALUES (1,'2019-03-01'),(1,'2019-03-05'),(1,'2019-03-10'),
(1,'2019-03-14'),(1,'2019-03-15'),(1,'2019-03-16'),
(2,'2019-03-01'),(2,'2019-03-05'),(2,'2019-03-11'),
(2,'2019-03-15');
The first set is to get the 14 days of data for the COMBINE
WITH actions AS (
SELECT _time::date as date
,dateadd(day, 14, date) as date14
,minhash(100, _user) as user_id_set
FROM events
GROUP BY 1
)
SELECT fa.date
,ARRAY_AGG(ra.date) WITHIN GROUP (ORDER BY ra.date)
,MINHASH_COMBINE(ra.user_id_set) AS sets
FROM actions AS fa
JOIN actions AS ra
ON (fa.date <= ra.date AND fa.date14 > ra.date)
GROUP BY 1
ORDER BY 1;
this is similar to your code, but here I include on the RA the same day as FA. so I can group by FA.date, but have FA's data included. On the date range I was not sure if you are want 14 days later or 14 days worth of date. I assume the later, thus changed the end range termination.
Now we have the combined next 14 days of data for each day, we want to get the pairs of (in my code I do not put a max days of compare but just include all pairs). Now again the APPROXIMATE_SIMILARITY is a window function, so I build an Array which I will rip straight apart again, thus pivoting the data, which is what you're trying to do via the union all (this can bee seen in the pairs and unrolled CTE's)
WITH actions AS (
SELECT _time::date AS date
,dateadd(day, 14, date) as date14
,minhash(100, _user) AS user_id_set
FROM events
GROUP BY 1
), combined AS (
SELECT fa.date
,MINHASH_COMBINE(ra.user_id_set) AS sets
FROM actions AS fa
JOIN actions AS ra
ON fa.date <= ra.date AND fa.date14 > ra.date
GROUP BY 1
), pairs AS (
SELECT fa.date
,DATEDIFF('day', fa.date, ra.date) AS diff
,ARRAY_CONSTRUCT(fa.sets,ra.sets) AS comp_set
FROM combined AS fa
JOIN combined AS ra
ON fa.date < ra.date
), unrolled AS (
SELECT date
,diff
,f.value AS sets
FROM pairs p,
LATERAL FLATTEN(input => p.comp_set) f
)
SELECT date
,diff
,APPROXIMATE_SIMILARITY(sets)
FROM unrolled
GROUP BY 1,2
ORDER BY 1,2;
and thus you get results for all days
DATE DIFF APPROXIMATE_SIMILARITY(SETS)
2019-03-01 4 1
2019-03-01 9 1
2019-03-01 10 1
2019-03-01 13 1
2019-03-01 14 1
2019-03-01 15 0.51
2019-03-05 5 1
2019-03-05 6 1
2019-03-05 9 1
2019-03-05 10 1
2019-03-05 11 0.51
2019-03-10 1 1
2019-03-10 4 1
2019-03-10 5 1
2019-03-10 6 0.51
2019-03-11 3 1
2019-03-11 4 1
2019-03-11 5 0.51
2019-03-14 1 1
2019-03-14 2 0.51
2019-03-15 1 0.51