在这种情况下如何构造我的SQL遇到了麻烦。 我有3张桌子:
人员表:
df_start = df.sample(1)
for index, row in df_start.iterrows():
total_count = row['count']
df1 = row.values
while total_count < 400:
df_tmp = df.sample(25)
total_count = total_count + df_tmp['count'].sum()
df1 = df1 + df_tmp.sum()
FACT_1表:
ID
--
A
FACT_2表:
Person_ID DAY metric
--------------------
A 1 x
A 2 y
我希望结果是:
Person_ID DAY metric
--------------------
A 3 a
A 2 b
因此,这就像个人ID和日期分别与每个事实表的外部联接..但是当个人和日期相同时,我需要将两个指标捆绑在一起。 事实表可能很大,因此请记住这一点。
抱歉,格式不熟悉。
答案 0 :(得分:1)
您可以通过在MySQL不支持的事实表上执行FULL JOIN
来获得结果,但是可以使用两个带有LEFT JOIN
的查询来模拟它,然后与UNION
结合使用。在这两个查询中,我们检查person
子句的WHERE
表中是否存在人员(两次是为了尽快限制要处理的行数):
SELECT
COALESCE(f.p1, f.p2) as person_id,
COALESCE(f.d1, f.d2) as day,
m1 as metric1,
m2 as metric2
FROM (
SELECT f1.person_id as p1,f1.day as d1,f1.metric as m1,f2.person_id as p2,f2.day as d2,f2.metric as m2
FROM fact_1 f1
LEFT JOIN fact_2 f2 ON f1.person_id = f2.person_id and f1.day = f2.day
WHERE EXISTS (SELECT 1 FROM person p WHERE p.id = f1.person_id)
UNION
SELECT f1.person_id as p1,f1.day as d1,f1.metric as m1,f2.person_id as p2,f2.day as d2,f2.metric as m2
FROM fact_2 f2
LEFT JOIN fact_1 f1 ON f1.person_id = f2.person_id and f1.day = f2.day
WHERE EXISTS (SELECT 1 FROM person p WHERE p.id = f2.person_id)
) f
ORDER BY person_id, day
这给出结果:
person_id day metric1 metric2
---------------------------------------
A 1 x null
A 2 y b
A 3 null a
如果您确信person_id
在事实表中是正确的(您已经在外键约束中强制执行了它,或者以某种方式对其进行了检查),则可以跳过WHERE EXISTS
检查以提高性能。
考虑在fact_1(person_id, day)
和fact_2(person_id, day)
上创建索引。
答案 1 :(得分:1)
另一种选择是创建唯一日期的记录集:
select DAY from FACT_1
union select DAY from FACT_2
您还可以获得以数字序列表示的天数(如果使用的是最新版本的MySQL,即使使用递归CTE也是如此):
select * from (
select 1
union all select 2
union all select 3
-- ...
) Days
您可以CROSS JOIN
到Person
表,然后左联接每个FACT
表以获得所需的内容:
select
Person.`ID`
,Days.Day
,FACT_1.metric metric1
,FACT_2.metric metric2
from Person
cross join
( select DAY from FACT_1
union select DAY from FACT_2
) DAYS
left join FACT_1 on
FACT_1.Person_ID = Person.`ID`
and FACT_1.Day = Days.Day
left join FACT_2 on
FACT_2.Person_ID = Person.`ID`
and FACT_2.Day = Days.Day
SQL小提琴here。
答案 2 :(得分:0)
除非要从“人”表中获取其他数据,否则此查询不需要它。不过,如果需要,您可以将其加入UNION。
SELECT
u.Person_ID
,u.DAY
,MAX(u.metric1) AS metric1
,MAX(u.metric2) AS metric2
FROM
(
SELECT
f1.Person_ID
,f1.DAY
,f1.metric AS metric1
,NULL AS metric2
FROM
Fact_1 AS f1
UNION ALL
SELECT
f2.Person_ID
,f2.DAY
,NULL AS metric1
,f2.metric AS metric2
FROM
Fact_2 AS f2
) AS u
GROUP BY
u.Person_ID
,u.DAY
Results:
+-----------+-----+---------+---------+
| Person_ID | DAY | metric1 | metric2 |
+-----------+-----+---------+---------+
| A | 1 | x | NULL |
| A | 2 | y | b |
| A | 3 | NULL | a |
+-----------+-----+---------+---------+