运行一个中等复杂的SQL查询并遇到此错误,我找不到很好的解释,以前也没有遇到过:
Error in SQL statement: package.TreeNodeException: execute, tree:
作为问题的一部分,我将在此处包括整个查询,因为我无法隔离出一个演示问题的小例子:
WITH visits AS (
SELECT
visitor_key
, channel_vec AS digital_marketing_channel
, to_timestamp(date_time, "MM/dd/yyyy HH:mm") AS timestamp
, (HOUR(to_timestamp(date_time, "MM/dd/yyyy HH:mm")) / 24) + (MINUTE(to_timestamp(date_time, "MM/dd/yyyy HH:mm")) / (24 * 60)) AS days_carried
, conversion
FROM vectorised
), conversions_only AS (
SELECT
visitor_key
, conversion
, timestamp
, days_carried
, RANK() OVER(PARTITION BY visitor_key ORDER BY timestamp) AS conversion_rank
FROM visits
WHERE conversion = 1
), all_conversions AS (
SELECT
v.*
, MIN(conversion_rank) AS path_id
FROM visits v
JOIN conversions_only c ON v.visitor_key = c.visitor_key
WHERE v.timestamp <= c.timestamp
GROUP BY
v.visitor_key
, v.digital_marketing_channel
, v.timestamp
, v.days_carried
, v.conversion
), converted_paths AS (
SELECT
a.*
, CASE
WHEN path_id > 1 THEN 1
ELSE 0
END AS previous_conversion
, DATEDIFF(c.timestamp, a.timestamp) + c.days_carried - a.days_carried AS path_days_remaining
, 1 AS converted_path
FROM all_conversions a
JOIN conversions_only c ON a.visitor_key = c.visitor_key AND a.path_id = c.conversion_rank
), all_paths AS (
SELECT
visitor_key
, 0 AS path_id
, digital_marketing_channel
, conversion
, DATEDIFF("2019-04-02", timestamp) - days_carried AS path_days_remaining
, 0 AS converted_path
, 0 AS previous_conversion
FROM visits
WHERE visitor_key NOT IN (SELECT DISTINCT visitor_key FROM all_conversions)
UNION ALL
SELECT
visitor_key
, path_id
, digital_marketing_channel
, conversion
, path_days_remaining
, converted_path
, previous_conversion
FROM converted_paths
), steps AS (
SELECT
*
, ROW_NUMBER() OVER(PARTITION BY visitor_key, path_id ORDER BY path_days_remaining, conversion DESC) AS step_id
FROM all_paths
WHERE conversion = 0
ORDER BY visitor_key, path_id, path_days_remaining DESC, conversion
), output AS (
SELECT
visitor_key
, path_id
, pad_matrix(collect_list(digital_marketing_channel), 10) AS channels
, collect_list(path_days_remaining) AS days_remaining
, converted_path
, previous_conversion
FROM steps
WHERE step_id < 11
GROUP BY
visitor_key
, path_id
, converted_path
, previous_conversion
), helper AS (
SELECT
visitor_key
, path_id
, converted_path
, previous_conversion
, COUNT(*) AS steps
FROM all_paths
GROUP BY
visitor_key
, path_id
, converted_path
, previous_conversion
)
SELECT
*
FROM helper
WHERE converted_path = 0
该问题似乎起源于“ helper”帮助器表和最终的select语句,并且似乎特定于那个convertdpath = 0元素,converted_path是包含0和1s的列。要使事情复杂化,请使用
WHERE converted_path = 1
有效,而
WHERE converted_path != 1
引起相同的错误。将where语句上移到“ helper”表中会导致相同的问题。我可以对其他列执行完全相同的分析,而不会出现问题,问题仅在于converted_path列。如果我将未过滤的“ helper”表的输出另存为数据库中的表,则可以根据需要在新表上执行过滤器查询。同样,如果我将“帮助程序”绘制的“ all_paths”表格另存为新表,则可以在保存的“ all_paths”表上执行“帮助程序”和最终的筛选选择语句。
很明显,这是一个我可以解决的问题,所以我更加担心,根本上我不了解的事情可能发生在union语句所在的all_paths子表中?如果有人能向我指出我所缺少的正确方向,我将非常感激。
谢谢!