我有下表:
Date |event_number| customer_id1 | customer_age | customer_gender
10/01/2020 | 1 | abc | NULL | NULL
10/01/2020 | 2 | abc | NULL | male
10/01/2020 | 3 | abc | 45 | NULL
10/01/2020 | 1 | def | 30 | NULL
我想每天运行一次SQL查询,以查找custom_id1,customer_age,customer_gender的新组合。
输出应如下所示:
query_run_time | customer_id1 | customer_age | customer gender
11/01/2020 | abc | 45 | male
11/01/2020 | def | 30 | NULL
查询运行时间是查询运行的日期。如果表中已经存在组合(customer_id,custmer_age,customer_gender),则我不想插入该行。
谢谢
答案 0 :(得分:0)
您可以使用窗口函数为合并多个查询分配内部行号,例如像这样:
SELECT COALESCE(a.customer_id, b.customer_id) as customer_id
, customer_age
, customer_gender
FROM (
SELECT customer_id, customer_age
, ROW_NUMBER() OVER ( PARTITION BY customer_id ORDER BY customer_age ) AS row_no
FROM customer_event
WHERE customer_age IS NOT NULL
) a
FULL JOIN (
SELECT customer_id, customer_gender
, ROW_NUMBER() OVER ( PARTITION BY customer_id ORDER BY customer_gender ) AS row_no
FROM customer_event
WHERE customer_gender IS NOT NULL
) b ON b.customer_id = a.customer_id
AND b.row_no = a.row_no
ORDER BY COALESCE(a.customer_id, b.customer_id)
, COALESCE(a.row_no, b.row_no)
架构和测试数据
CREATE TABLE customer_event (
event_number INT NOT NULL,
customer_id VARCHAR(10) NOT NULL,
customer_age INT,
customer_gender VARCHAR(10)
);
INSERT INTO customer_event VALUES
( 1, 'abc', NULL, NULL ),
( 2, 'abc', NULL, 'male' ),
( 3, 'abc', 45 , NULL ),
( 4, 'abc', 50 , 'female' ),
( 5, 'abc', 27 , NULL ),
( 1, 'def', 30 , NULL );
输出
customer_id customer_age customer_gender
abc 27 female
abc 45 male
abc 50 (null)
def 30 (null)
以上内容来自在SQL Fiddle上使用 PostgreSQL 9.6 进行的测试。
答案 1 :(得分:0)
使用Window function
SELECT query_run_time, customer_id, MAX(customer_age) customer_age,
MAX(customer_gender)customer_gender
FROM tbl
GROUP BY query_run_time, customer_id
输出
query_run_time | customer_id1 | customer_age | customer gender
11/01/2010 | abc | 45 | male
11/01/2020 | def | 30 | NULL
答案 2 :(得分:0)
我怀疑您真正想要的是每列的最新值。这是一种方法:
select date, customerid1,
array_agg(customer_age ignore nulls order by event_number desc limit 1)[safe_ordinal(1) as age,
array_agg(customer_gender ignore nulls order by event_number desc limit 1)[safe_ordinal(1) as gender
from t
group by date, customerid1;