我需要创建一个数据集,该数据集包含与源表相同的行,但是将出生日期替换为为该人找到的最常见的出生日期值。如果存在平局,则应使用最近的日期。
输入
id first_name last_name dob date
---------------------------------------------
1 john doe 06/11/85 01/01/17
2 john doe 06/11/86 01/01/17
3 john doe 06/11/86 01/01/17
4 jane doh 01/06/87 01/01/17
5 jane doh 01/01/80 01/02/17
输出
1 john doe 06/11/86 01/01/17
2 john doe 06/11/86 01/01/17
3 john doe 06/11/86 01/01/17
4 jane doh 01/01/80 01/01/17
5 jane doh 01/01/80 01/02/17
John Doe更新于06/11/86(最常见)。 jane doh更新到01/01/80(打破断路器)。
我最近的尝试基于一个类似的例子:
SELECT a.id, a.first_name, a.last_name, a.date, b.id FROM
(SELECT first_name, last_name,dob,count(*) FROM table group by first_name, last_name,dob having count(*) in
(SELECT max(total) AS freq FROM
(SELECT first_name, last_name, dob, count(*) AS total FROM table group by first_name, last_name, dob)
AS test_temp group by first_name, last_name)
) a join (select * FROM table) b on (a.id = b.id)
我不想要一个解决方案,但也想要一个我可以学习的解释。
答案 0 :(得分:0)
SELECT a.id, a.first_name, a.last_name, b.dob, a.date
FROM table a
JOIN (SELECT DISTINCT id, first_name, last_name, dob, count(dob) AS cnt
FROM table ORDER BY cnt DESC LIMIT 1) b
ON (a.first_name=b.first_name) AND (a.last_name=b.last_name)
我会尝试这个。我使用subselect加入了基表,以获得最常见的dob。 ORDER BY cnt DESC LIMIT 1
max(count(dob))
firt_name
{@ 1}} last_name
{@}} {}}}然后我就把这个dob加入到具有相同include platform/$(PLATFORM).mk
和platform
的每条记录中。我希望能帮到你。
答案 1 :(得分:0)
您可以使用first_value()
功能指定出生日期,而不是JOIN
:
select t.id, t.first_name, t.last_name,
first_value(dob) over (partition by first_name, last_name
order by dob_cnt desc, date desc
rows between unbounded preceding and current row
) as dob_imputed
from (select t.*,
count(*) over (partition by first_name, last_name, dob) as dob_cnt
from t
) t