我的公司正在与3个合作伙伴合作,每个合作伙伴都可以拥有多个品牌。每周,我都会获得每个品牌用户列表的转储 我存储在MySQL数据库中,每个品牌都有一个表格。每个品牌都包含一个用户列表和一些基本信息 (出生年份,邮政编码,性别)。有些用户可以使用不同的品牌注册,每个品牌都可以拥有自己的一组用户数据。
例如,用户使用Canvas和MNM注册。在Canvas,他们的个人资料如下所示:
ID GENDER BIRTHYEAR POSTCODE MODIFIED
94bafdb3e155d30349f1113a25c0714f M 1973 2800 2009-01-01 09:01:01
在MNM,像这样:
ID GENDER BIRTHYEAR POSTCODE MODIFIED
94bafdb3e155d30349f1113a25c0714f 1973 1000 2009-09-09 09:01:01
我想创建一个视图(或表 - 我不确定哪个是最好的),它将使用最新版本的数据组合两个记录,但也让我知道数据的来源。
所以上面两条记录将结合起来:
ID GENDER G_DATE G_BRAND BIRTHYEAR B_DATE B_BRAND POSTCODE P_DATE P_BRAND
94bafdb3e155d30349f1113a25c0714f M 2009-01-01 09:01:01 Canvas 1973 2009-09-09 09:01:01 MNM 1000 2009-09-09 09:01:01 MNM
我正在想象一些复杂的工会和子查询系列,但我甚至不确定从哪里开始。
我创建了一个合并所有表的视图
CREATE VIEW view_combine AS
SELECT ID, GENDER, MODIFIED as G_DATE, 'Canvas' as G_BRAND,
BIRTHYEAR, MODIFIED as B_DATE, 'Canvas' as B_BRAND,
POSTCODE, MODIFIED as P_DATE, 'Canvas' as P_BRAND FROM canvas
UNION ALL
SELECT ID, GENDER, MODIFIED as G_DATE, 'Een' as G_BRAND,
BIRTHYEAR, MODIFIED as B_DATE, 'Een' as B_BRAND,
POSTCODE, MODIFIED as P_DATE, 'Een' as P_BRAND FROM een
UNION ALL
SELECT ID, GENDER, MODIFIED as G_DATE, 'MNM' as G_BRAND,
BIRTHYEAR, MODIFIED as B_DATE, 'MNM' as B_BRAND,
POSTCODE, MODIFIED as P_DATE, 'MNM' as P_BRAND FROM mnm
然后我正在尝试对此进行选择,但我不认为这是正确的方向。
SELECT v1.hashkey, ge.gender, ge.g_date, ge.g_brand,
bi.birthyear, bi.b_date, bi.b_brand,
pc.postcode, pc.p_date, pc.p_brand
FROM view1 v1
JOIN (
select g.hashkey, g.gender, g.g_date, g.g_brand
from view1 g
left join view1 g1 ON g.hashkey = g1.hashkey AND g.g_date < g1.g_date
WHERE g1.hashkey IS NULL
) ge ON ge.HASHKEY = v1.HASHKEY
JOIN (
select b.hashkey, b.birthyear, b.b_date, b.b_brand
from view1 b
left join view1 b1 ON b.hashkey = b1.hashkey AND b.b_date < b1.b_date
WHERE b1.hashkey IS NULL
) bi ON bi.HASHKEY = v1.HASHKEY
JOIN (
select p.hashkey, p.postcode, p.p_date, p.p_brand
from view1 p
left join view1 p1 ON p.hashkey = p1.hashkey AND p.p_date < p1.p_date
WHERE p1.hashkey IS NULL
) pc ON pc.HASHKEY = v1.HASHKEY
GROUP BY v1.hashkey
答案 0 :(得分:1)
我设法解决了这个问题。基本上,我需要选择视图,然后在视图上进行子选择以获取我想要的字段。我发现在子选择中的日期排序返回了我需要的值。
SELECT v1.hashkey, ge.gender, ge.g_date, ge.g_brand,
bi.birthyear, bi.b_date, bi.b_brand,
pc.postcode, pc.p_date, pc.p_brand
FROM view_combine v1
JOIN (
select g.hashkey, g.gender, g.g_date, g.g_brand
from view_combine g
left join view_combine g1 ON g.hashkey = g1.hashkey AND g.g_date < g1.g_date and g1.gender is not null
WHERE g1.hashkey IS NULL
order by g.g_date
) ge ON ge.HASHKEY = v1.HASHKEY
JOIN (
select b.hashkey, b.birthyear, b.b_date, b.b_brand
from view_combine b
left join view_combine b1 ON b.hashkey = b1.hashkey AND b.b_date < b1.b_date and b1.birthyear is not null
WHERE b1.hashkey IS NULL
order by b.b_date
) bi ON bi.HASHKEY = v1.HASHKEY
JOIN (
select p.hashkey, p.postcode, p.p_date, p.p_brand
from view_combine p
left join view_combine p1 ON p.hashkey = p1.hashkey AND p.p_date < p1.p_date and p1.postcode is not null
WHERE p1.hashkey IS NULL
order by p.p_date
) pc ON pc.HASHKEY = v1.HASHKEY
GROUP BY v1.hashkey
答案 1 :(得分:1)
我意识到你已经解决了,但作为次要观点,这是我预先处理的事情。
鉴于数据:
合作伙伴1 - UserA,男,Null,6300,9 / 9/09
合作伙伴2 - UserA,Null,1980,2300,9 / 10/09
查询UserA时,您很可能想要“最新记录”:
UserA,男,1980年,2300
使用下表:
类型码
DisplayName
用户ID
性别
GenderSourcePartner
BirthYear
BirthYearSourcePartner
邮编
PostalCodeSourcePartner
PartnerTypeCode
用户ID
性别
BirthYear
邮编
ModifiedDate
然后,当我收到合作伙伴源文件时,我会逐行处理它以更新当前用户表并附加到PartnerSourceData表(将其用作日志。)