我一直在研究这个剧本已经走到了尽头。该脚本有效,但不幸的是会产生重复。我的脚本在state_issue_teacher_id键上跨数据库连接两个不同的表,然后生成输出。我检查了两个表并且行数相同,并且连接应该完全匹配记录,但显然我的密钥或我加入表的方式有问题,我的输出回来部分不正确。我还尝试连接属性来创建一个唯一的键并加入表但仍然产生不正确的结果。
这是我的剧本:
SELECT
LTRIM(RTRIM(rt.year_time)) AS 'year_time' ,
LTRIM(RTRIM(rt.state_issue_teacher_id)) AS state_issue_teacher_id ,
LTRIM(RTRIM(rt.district_code)) AS district_code ,
rt.district_name ,
rt.school_name ,
LTRIM(RTRIM(rt.assignment_code)) AS assignment_code ,
rt.assignment_desc ,
LTRIM(RTRIM(rt.position_code)) AS position_code ,
rt.position_desc ,
LTRIM(RTRIM(rt.last_name)) AS last_name ,
LTRIM(RTRIM(rt.first_name)) AS first_name ,
LTRIM(RTRIM(rt.total_salary)) AS total_salary ,
rt.assign_fte ,
LTRIM(RTRIM(rt.school_code)) AS school_code ,
rt.fte
FROM staging.dbo.rt AS rt
LEFT JOIN ( SELECT LTRIM(RTRIM(dti.year)) AS year ,
LTRIM(RTRIM(dt.teacher_id)) AS teacher_id ,
LTRIM(RTRIM(db.district_code)) AS district_code ,
db.district_name ,
LTRIM(RTRIM(dt.last_name)) AS last_name ,
LTRIM(RTRIM(dt.first_name)) AS first_name ,
LTRIM(RTRIM(da.assignment_code)) AS assignment_code ,
LTRIM(RTRIM(dp.position_code)) AS position_code ,
dre.race_ethnicity_code ,
LTRIM(RTRIM(SUBSTRING(db.school_code,10,4))) AS school_code ,
da.assignment_desc ,
dp.position_desc ,
fs.total_fte
FROM mart.dbo.fact_s AS fs
LEFT OUTER JOIN mart.dbo.fact_s.dbo.dim_building
AS db ON fs.building_key = db.building_key
LEFT OUTER JOIN mart.dbo.fact_s.dbo.dim_teacher
AS dt ON fs.teacher_key = dt.teacher_key
LEFT OUTER JOIN mart.dbo.fact_s.dbo.dim_assignment
AS da ON fs.assignment_key = da.assignment_key
LEFT OUTER JOIN mart.dbo.fact_s.dbo.dim_race_ethnicity
AS dre ON dt.race_ethnicity_key = dre.race_ethnicity_key
LEFT OUTER JOIN mart.dbo.fact_s.dbo.dim_gender
AS dg ON dt.gender_key = dg.gender_key
LEFT OUTER JOIN mart.dbo.fact_s.dbo.dim_time
AS dti ON fs.time_key = dti.time_key
LEFT OUTER JOIN mart.dbo.fact_s.dbo.dim_position
AS dp ON fs.position_key = dp.position_key
WHERE dti.year = '2012'
) raw ON rt.state_issue_teacher_id = raw.teacher_id
AND rt.year_time = raw.year
AND rt.last_name = raw.last_name
AND rt.first_name = raw.first_name
AND rt.district_code = raw.district_code
AND rt.position_code = raw.position_code
AND rt.school_code = RAW.school_code
AND rt.assignment_code = raw.assignment_code
WHERE rt.year_time = '2012'
ORDER BY rt.last_name, rt.first_name
我得到的输出是:
合并教师作业的fte应加起来为1.但具有相同assignment_code / desc并且具有多个部分作业的教师正在产生重复。示例:Jane Doe出现4次,总fte为2.0而不是2次,正确的总数为1.0。输出应如下所示。
答案 0 :(得分:1)
您似乎正在为具有多个作业的兼职教师获取重复项,并且所有作业的描述都相同。从实际输出的前四行与所需输出的前两行相比,这一点非常清楚。
我想知道为什么你会有这些重复的开头。然而,他们在事实表中,所以必须有一些重要的东西(我认为两个兼职指导顾问是资助而不是一个全职的辅导员)。在这种情况下,事实表是否确实具有完全重复的记录?如果没有,那么不重复的字段可能会建议一个可以解决问题的附加连接键。
您需要摆脱此加入条件产生的笛卡尔积:rt.assignment_code = raw.assignment_code
。
除了找到更好的连接键之外,我还可以想出两种方法来解决这个问题。第一个是为职位创建一个真正独特的ID。也许在您的数据结构中,您知道一个。或者,您可以使用row_number()
为有多个职位的人添加序列号。
另一种方法是消除一方或另一方的重复。例如,您可以汇总rt
以消除此类重复项。