将多个数据源组合到一个统一的表中

时间:2009-10-01 15:41:44

标签: sql mysql

我的公司正在与3个合作伙伴合作,每个合作伙伴都可以拥有多个品牌。每周,我都会获得每个品牌用户列表的转储 我存储在MySQL数据库中,每个品牌都有一个表格。每个品牌都包含一个用户列表和一些基本信息 (出生年份,邮政编码,性别)。有些用户可以使用不同的品牌注册,每个品牌都可以拥有自己的一组用户数据。

例如,用户使用Canvas和MNM注册。在Canvas,他们的个人资料如下所示:

ID                                  GENDER  BIRTHYEAR   POSTCODE    MODIFIED
94bafdb3e155d30349f1113a25c0714f    M       1973        2800        2009-01-01 09:01:01

在MNM,像这样:

ID                                  GENDER  BIRTHYEAR   POSTCODE    MODIFIED
94bafdb3e155d30349f1113a25c0714f            1973        1000        2009-09-09 09:01:01

我想创建一个视图(或表 - 我不确定哪个是最好的),它将使用最新版本的数据组合两个记录,但也让我知道数据的来源。

所以上面两条记录将结合起来:

ID                                  GENDER  G_DATE              G_BRAND BIRTHYEAR   B_DATE              B_BRAND POSTCODE   P_DATE               P_BRAND
94bafdb3e155d30349f1113a25c0714f    M       2009-01-01 09:01:01 Canvas  1973        2009-09-09 09:01:01 MNM     1000       2009-09-09 09:01:01  MNM

我正在想象一些复杂的工会和子查询系列,但我甚至不确定从哪里开始。

我创建了一个合并所有表的视图

CREATE VIEW view_combine AS
SELECT ID, GENDER, MODIFIED as G_DATE, 'Canvas' as G_BRAND, 
    BIRTHYEAR, MODIFIED as B_DATE, 'Canvas' as B_BRAND, 
    POSTCODE, MODIFIED as P_DATE, 'Canvas' as P_BRAND FROM canvas
UNION ALL
SELECT ID, GENDER, MODIFIED as G_DATE, 'Een' as G_BRAND, 
    BIRTHYEAR, MODIFIED as B_DATE, 'Een' as B_BRAND, 
    POSTCODE, MODIFIED as P_DATE, 'Een' as P_BRAND FROM een
UNION ALL
SELECT ID, GENDER, MODIFIED as G_DATE, 'MNM' as G_BRAND, 
    BIRTHYEAR, MODIFIED as B_DATE, 'MNM' as B_BRAND, 
    POSTCODE, MODIFIED as P_DATE, 'MNM' as P_BRAND FROM mnm

然后我正在尝试对此进行选择,但我不认为这是正确的方向。

SELECT v1.hashkey, ge.gender, ge.g_date, ge.g_brand, 
    bi.birthyear, bi.b_date, bi.b_brand, 
    pc.postcode, pc.p_date, pc.p_brand
FROM view1 v1
JOIN ( 
    select g.hashkey, g.gender, g.g_date, g.g_brand 
    from view1 g 
    left join view1 g1 ON g.hashkey = g1.hashkey AND g.g_date < g1.g_date 
    WHERE g1.hashkey IS NULL
) ge ON ge.HASHKEY = v1.HASHKEY
JOIN ( 
    select b.hashkey, b.birthyear, b.b_date, b.b_brand 
    from view1 b 
    left join view1 b1 ON b.hashkey = b1.hashkey AND b.b_date < b1.b_date 
    WHERE b1.hashkey IS NULL
) bi ON bi.HASHKEY = v1.HASHKEY
JOIN ( 
    select p.hashkey, p.postcode, p.p_date, p.p_brand 
    from view1 p 
    left join view1 p1 ON p.hashkey = p1.hashkey AND p.p_date < p1.p_date 
    WHERE p1.hashkey IS NULL
) pc ON pc.HASHKEY = v1.HASHKEY
GROUP BY v1.hashkey

2 个答案:

答案 0 :(得分:1)

我设法解决了这个问题。基本上,我需要选择视图,然后在视图上进行子选择以获取我想要的字段。我发现在子选择中的日期排序返回了我需要的值。

SELECT v1.hashkey, ge.gender, ge.g_date, ge.g_brand, 
    bi.birthyear, bi.b_date, bi.b_brand, 
    pc.postcode, pc.p_date, pc.p_brand
FROM view_combine v1
JOIN ( 
    select g.hashkey, g.gender, g.g_date, g.g_brand 
    from view_combine g 
    left join view_combine g1 ON g.hashkey = g1.hashkey AND g.g_date < g1.g_date and g1.gender is not null
    WHERE g1.hashkey IS NULL
    order by g.g_date
) ge ON ge.HASHKEY = v1.HASHKEY
JOIN ( 
    select b.hashkey, b.birthyear, b.b_date, b.b_brand 
    from view_combine b 
    left join view_combine b1 ON b.hashkey = b1.hashkey AND b.b_date < b1.b_date and b1.birthyear is not null
    WHERE b1.hashkey IS NULL
    order by b.b_date
) bi ON bi.HASHKEY = v1.HASHKEY
JOIN ( 
    select p.hashkey, p.postcode, p.p_date, p.p_brand 
    from view_combine p 
    left join view_combine p1 ON p.hashkey = p1.hashkey AND p.p_date < p1.p_date and p1.postcode is not null
    WHERE p1.hashkey IS NULL
    order by p.p_date
) pc ON pc.HASHKEY = v1.HASHKEY
GROUP BY v1.hashkey

答案 1 :(得分:1)

我意识到你已经解决了,但作为次要观点,这是我预先处理的事情。

鉴于数据:
合作伙伴1 - UserA,男,Null,6300,9 / 9/09 合作伙伴2 - UserA,Null,1980,2300,9 / 10/09

查询UserA时,您很可能想要“最新记录”:
UserA,男,1980年,2300

使用下表:

合作伙伴

类型码
DisplayName

CurrentUser

用户ID
性别
GenderSourcePartner
BirthYear
BirthYearSourcePartner
邮编
PostalCodeSourcePartner

PartnerSourceData

PartnerTypeCode
用户ID
性别
BirthYear
邮编
ModifiedDate

然后,当我收到合作伙伴源文件时,我会逐行处理它以更新当前用户表并附加到PartnerSourceData表(将其用作日志。)