使用事件表,我需要返回日期和类型:
最近的事件可能具有空值,在这种情况下,该事件需要返回最新的非空值
我在SO上找到了几篇类似的文章和帖子(甚至是相同的),但无法解码或理解解决方案-即
Fill null values with last non-null amount - Oracle SQL
https://www.itprotoday.com/sql-server/last-non-null-puzzle
表如下-还有其他列,但是为了简单起见,我仅包括3列。另请注意,第一类型和日期可以为空。在这种情况下,需要返回null。
╔═══════╦════════╦════════════╗
║ Email ║ Type ║ Date ║
╠═══════╬════════╬════════════╣
║ A ║ Create ║ 2019-04-01 ║
║ A ║ Update ║ 2019-04-02 ║
║ A ║ null ║ null ║
╚═══════╩════════╩════════════╝
输出应为:
╔═══════╦═══════════╦════════════╦══════════╦════════════╗
║ Email ║ FirstType ║ FirstDate ║ LastType ║ LastDate ║
╠═══════╬═══════════╬════════════╬══════════╬════════════╣
║ A ║ Create ║ 2019-04-01 ║ Update ║ 2019-04-02 ║
╚═══════╩═══════════╩════════════╩══════════╩════════════╝
我尝试的第一种方法是使用子查询将表连接到自身,该子查询使用case语句查找MIN和MAX日期:
select
Email,
max(case when T1.Date = T2.Min_Date then T1.Type end) as FirstType,
max(case when T1.Date = T2.Min_Date then T1.Date end) as FirstDate,
max(case when T1.Date = T2.Max_Date then T1.Type end) as LastType,
max(case when T1.Date = T2.Max_Date then T1.Date end) as LastDate,
from
T1
join
(select
EmailAddress,
max(Date) as Max_Date,
min(Date) as Min_Date
from
Table1
group by
Email
) T2
on
T1.Email = T2.Email
group by
T1.Email
这似乎适用于MIN值,但MAX值将返回null。
为解决返回最后一个非值的问题,我尝试执行以下操作:
select
EmailAddress,
max(Date) over (partition by EmailAddress rows unbounded preceding) as LastDate,
max(Type) over (partition by EmailAddress rows unbounded preceding) as LastType
from
T1
group by
EmailAddress,
Date,
Type
但是,结果为3行,而不是1行。
我承认我不太了解分析功能,因为我不必冗长地处理它们。任何帮助将不胜感激。
编辑: 上面的示例准确地表示了数据的外观,但是下面的示例是我正在使用的确切示例数据。
示例:
╔═══════╦════════╦════════════╗
║ Email ║ Type ║ Date ║
╠═══════╬════════╬════════════╣
║ A ║ Create ║ 2019-04-01 ║
║ A ║ null ║ null ║
╚═══════╩════════╩════════════╝
所需结果:
╔═══════╦═══════════╦════════════╦══════════╦════════════╗
║ Email ║ FirstType ║ FirstDate ║ LastType ║ LastDate ║
╠═══════╬═══════════╬════════════╬══════════╬════════════╣
║ A ║ Create ║ 2019-04-01 ║ Create ║ 2019-04-01 ║
╚═══════╩═══════════╩════════════╩══════════╩════════════╝
其他用例:
╔═══════╦════════╦════════════╗
║ Email ║ Type ║ Date ║
╠═══════╬════════╬════════════╣
║ A ║ null ║ null ║
║ A ║ Create ║ 2019-04-01 ║
╚═══════╩════════╩════════════╝
所需结果:
╔═══════╦═══════════╦════════════╦══════════╦════════════╗
║ Email ║ FirstType ║ FirstDate ║ LastType ║ LastDate ║
╠═══════╬═══════════╬════════════╬══════════╬════════════╣
║ A ║ null ║ null ║ Create ║ 2019-04-01 ║
╚═══════╩═══════════╩════════════╩══════════╩════════════╝
答案 0 :(得分:0)
使用窗口函数和条件聚合:
select t.email,
max(case when seqnum = 1 then type end) as first_type,
max(case when seqnum = 1 then date end) as first_date,
max(case when seqnum_nonull = 1 and type is not null then type end) as last_type,
max(case when seqnum_nonull = 1 and type is not null then date end) as last_date
from (select t.*,
row_number() over (partition by email order by date) as seqnum,
row_number() over (partition by email, (case when type is null then 1 else 2 end) order by date) as seqnum_nonull
from t
) t
group by t.email;
答案 1 :(得分:0)
由于Spark SQL窗口函数支持NULLS LAST|FIRST
语法,因此您可以使用该语法,然后为rn值1和2指定具有多个聚合的枢轴。我可以查看更多示例数据,但这对您的数据集有效:< / p>
%sql
SELECT *, ROW_NUMBER() OVER( PARTITION BY email ORDER BY date NULLS LAST ) rn
FROM tmp;
;WITH cte AS
(
SELECT *, ROW_NUMBER() OVER( PARTITION BY email ORDER BY date NULLS LAST ) rn
FROM tmp
)
SELECT *
FROM cte
PIVOT ( MAX(date), MAX(type) FOR rn In ( 1, 2 ) )
通过在查询中提供所需的部分来重命名列,例如
-- Pivot and rename columns
;WITH cte AS
(
SELECT *, ROW_NUMBER() OVER( PARTITION BY email ORDER BY date NULLS LAST ) rn
FROM tmp
)
SELECT *
FROM cte
PIVOT ( MAX(date) AS Date, MAX(type) AS Type FOR rn In ( 1 First, 2 Last ) )
或者提供列列表,例如
-- Pivot and rename columns
;WITH cte AS
(
SELECT *, ROW_NUMBER() OVER( PARTITION BY email ORDER BY date NULLS LAST ) rn
FROM tmp
), cte2 AS
(
SELECT *
FROM cte
PIVOT ( MAX(date) AS Date, MAX(type) AS Type FOR rn In ( 1 First, 2 Last ) )
)
SELECT *
FROM cte2 AS (Email, FirstDate, FirstType, LastDate, LastType)
此简单查询使用ROW_NUMBER
为日期列排序的数据集分配行号,但使用NULLS LAST
语法确保空行出现在编号的最后。 PIVOT
然后将行转换为列。